Journal of Applied Psychology 


Edited by Donald G. Paterson, University of Minnesota 
Consulting Editors 


George K. Bennett, Psychological Corporation 
Walter V. Bingham, Washington, D. C. 
Harold E. Burtt, Ohio State University 
Allen L. Edwards, University of Washington 
Clifford E. Jurgensen, Minneapolis Gas Co. 
Irving Lorge, T. C. Columbia University 
. Quinn McNemar, Stanford University 


Alexander Mintz, City College of New York 
James P. Porter, Danville, Illinois 
Julian B. Rotter, Ohio State University 
Edward K. Strong, Jr., Stanford University 
Donald E. Super, T. C. Columbia University 
Morris S. Viteles, University of Pennsylvania 
Alfred C. Welch, Knox-Reeves, Minneapolis 





Table of Contents 


The Administrative Judgment Test: M. M. Mandell 
Baby and Industrial Efficiency: II. Quality and Quantity of Production: A. J. 


Cross- Validation of Clerical Aptitude Tests: E. N. Hay 
A Test Battery for Actuarial Clerks: A. Poruben, Jr 


Changes in Subjective Fatigue and Readiness for Work During the Eight-Hour Shift: J. W. 
Griffith, W. A. Kerr, and T. B. Mayo, Jr 


Accident Proneness of Factory Departments: W. A. Kerr 

The Rank-Comparison Rating Method: R. H. Bittner and E. A. Rundquist 

oo | an i a Key on a Short Industrial Personality Questionnaire: E. R. Carr 
H. F. Ro 


oi Your Fv Across by Plain Talk: A. O. England 

Prediction of Academic Success in Three Schools of Nursing: A. H. Ford 

Critical Requirements for Dentists: R. F. Wagner 

The Intra-Indévid nal oo Between Interest and Ability: S. M. Wesley, D. Q. 


A Project Test for Vocational Research and Guidance at the College Level: R. B. Ammons, 
M. N. Butler, and S. A. Herzig 





American Psychological Association 
Vol. 34, No. 3 





Journal of Applied Psychology 


Published Bi-monthly by the American Psychological Association, Inc. 
Prince and Lemon Sts., Lancaster, Pa. 


Annual subscription, $6.00; single copies, $1.25 


Subscriptions and business communications should be sent to 
American Psychological Association 
1515 Massachusetts Avenue N.W. 
Washington 5, D. C. 


Articles for publication and books for review should be sent to the Editor 


Professor Donald G. Paterson, Department of Psychology 
University of Minnesota, Minneapolis 14, Minnesota 





This Journal gives prompt consideration to 
manuscripts reporting original investigations in 
any field of applied psychology except clinical 
and consulting psychology. A descriptive or 
theoretical article is occasionally accepted if it 
deals in a distinctive manner with a problem of 
applied psychology. The policy is, however, to 
favor papers dealing with quantitative investi- 
gations of direct value to psychologists working 
in the following fields: Vocational diagnosis and 
occupational guidance; educational diagnosis, 
prediction and guidance at the secondary school 
levei and higher; personnel selection, training, 
placement, transfer and promotion in business, 
industry and government service including the 
armed forces; supervisory training in business, 
industry and government; bio-mechanics or de- 
sign of machines to fit the human operator; il- 
lumination, ventilation and fatigue in industry; 
job analysis, description, classification and eval- 
uation; measurement of morale of executives, 
supervisors, or employees ; surveys of opinion on 
social or political issues, such as those conducted 
by The Psychological Corporation; psychological 
problems in market research and in advertising. 


Articles may be under 500 words. The maxi- 
mum is 12,000 words, the average in the 


neighborhood of 4,000 words. To reduce lag of 
publication, adherence to the rule of “brevity 
consistent with clarity” is encouraged. 

A lapse of six to twelve months occurs between 
acceptance of an article and its publication, the 
lag varying with the rate at which manuscripts 
are submitted. If, however, an author is pre- 
pared to defray the costs of printing the neces- 
sary extra pages, he may arrange for earlier 
publication without thereby postponing the ap- 
pearance of manuscripts by other contributors. 
This enables the management to provide space in 
addition to the scheduled 64 pages per issue. 
“Early publication” is thus a direct contribution 
to the subscribers. By cutting down lag in pub- 
lication, it also benefits those authors whose 
articles are published in regular turn. 


Tables, footnotes and references as well as 
text of manuscripts should be typed double-spaced 
throughout. Authors should adhere to the con- 
ventions described by J. E. Anderson and W. 
L. Valentine in “The preparation of articles for 
publication in the journals of the American 
Psychological Association,” Psychol. Bull., 1944, 
41, 345-376. A reprint of this article will be 
loaned to any prospective contributor who does 
not find it in his library. 


Entered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879 


Acceptance for mailing at the special rate of 


postage provided for in paragraph (d-2), Section 34.40, 


P. L. & R. of 1948, authorized October 10, 1947 
Copyright, 1950, by The American Psychological Association, Inc, 





Journal of Applied Psychology 








VoL. 34, No. 3 


Jung, 1950 








The Administrative Judgment Test * 


Milton M. Mandell 
United States Civil Service Commission, Washington, D. C. 


The development of valid methods for selec- 
ting persons to carry the primary responsibility 
of directing our large corporations and govern- 
ment agencies is still a major task in the field 
of personnel administration. 

Ample justification for extensive study of 
this question is found in the fact that analysis 
of unsuccessful organizations so often indicates 
that the lack of success is due to lack of admin- 
istrative ability. 

Further inducement may be found in the 
comment of the late Professor John G. Jenkins 
that psychologists should be willing to under- 
take projects in areas of great significance even 
though the experimental conditions are not so 
precise as would be desirable. 

Some psychologists have recognized the 
qualitative importance of the field of admin- 
istrative selection. The broad-gauge work of 
Carroll Shartle and his group at Ohio State 
University in the general field of administra- 
tion, with some attention to selection, should 
produce fundamental facts that will be of in- 
valuable aid to those interested in the selection 
of executives and administrators. 

One can venture the opinion that while much 
remains to be known in this field, there are al- 
ready some established facts which can be used 
as working hypotheses. For example, it is 
reasonable to hypothesize that there are both 
common elements and special elements in all 
administrative jobs. All positions which can 


“Other reports on the experimental work of the 
United States Civil Service Commission in the admin- 
istrative field are available in the following articles: 
Mandell, M. M., The selection and ction of a 

nel staff, Personnel, 1948, 25, 125-127; Mandell, 

. M., and Adkins, Dorothy C., The validity of 
written tests for the selection of administrative per- 
sonnel, Educ. psychol. Measmt., 1946, 6, 293-313; Man- 
dell, M. M., Testing for administrative and supervisory 
positions, Publ. Personnel Rev., 1948, 9, 190-193. 


properly be designated as executive or admin- 
istrative have certain common elements; how- 
ever, there are also such special elements as 
the amount of verbal ability required, the 
amount of persuasive skill required, the tempo 
of operations required, and other factors. Any 
good selection programs for these positions will 
take into account these common and special 
factors. 

It will be noted that there is no attempt here 
to define precisely administrative positions or 
executive positions. Job analyses indicate 
that program planning and coordination are 
essential characteristics of administrative and 
executive positions. Therefore, the following 
working definition is offered: An executive job 
or administrative job is one in which more than 
50 per cent of the time is devoted to program 
planning and coordination. This does not, 
obviously, include all the elements of the 
administrative job but it does include two 
elements which will probably always be found. 

The purpose of this brief report is to describe 
the administrative-judgment test which has 
been developed by the United States Civil 
Service Commission as part of its program of 
research in the field of administrative selection. 
The reason for describing this one test is that 
it is the test which has given the most con- 
sistently satisfactory results and the one on 
which the most experimentation has been done. 
While the data below are based on only 171 
cases, they represent four different samples and 
two different types of criteria. The relative 
consistency of the results among these different 
samples and the use of different criteria offer a 
basis for the belief that the test is measuring 
elements which are essential to administrative 
success. 


The Test. Theadministrative-judgment test 


145 











146 


is in 5-choice form. In the housing studies 
referred to below, 100 items were used; in the 
two other studies, 80 items were used. The 
test attempts to measure broad understanding 
of the processes of administration. It attempts 
to measure the understanding of the adminis- 
trative problems of large organizations, whether 
government or private. The questions at- 
tempt to measure the common elements in the 
administrative process. They include prob- 
lems in the relationships between the head- 
quarters and field offices in an organization, and 
those between research and operating person- 
nel. They also include problems on the timing 
of programs and the organization of the office 
of an administrator. The test does not at- 
tempt to measure technical knowledge in such 
fields as personnel or budgeting or accounting. 
It attempts, as far as possible, to divorce its 
contents from complete dependence on basic 
training, and attempts to emphasize problems 
which can be evaluated on the basis of observa- 
tion and experience or training. The split- 
half reliability of the administrative-judgment 
test is .94 based on the group of 258 cases on 
which scores were available. Below is a sample 
item from this test: 


Which one of the following administrative 
situations or problems will most probably 
occur when direct relations are permitted 
between a staff specialist employed by the 
national office of an organization and the 
operating officials employed in the field 
offices? 

A) decrease in the feeling of responsi- 
bility of national office specialists for 
the operations of state programs in 
their specialties 
inadequate technical supervision of 
field office operations 
inadequate knowledge in the national 
office of the competence and qualifi- 
cations of field office personnel 
difficulty in keeping the relations on 
an advisory basis 
subordination of professional consid- 
erations to general administrative 
responsibilities 


Criteria U sed. 


Two basic types of criteria 
have been used in the experiments with this 


test. The first type, job performance, repre- 
sents the collective ratings of colleagues and 
superiors. In all these cases except one, the 


job performance ratings are a composite of 


Milton M. 


Mandell 


graphic ratings and paired comparisons; in 
rating one group of 20 line administrators, only 
graphic ratings were used. In addition to job- 
performance ratings, the position grade or 
salary of the subject was also used as a crite- 
rion. An average of more than four inde- 
pendent ratings for each subject in the study 
was obtained. Because of the small samples 
involved, practically no cases have been elimi- 
nated from the samples because of disagree- 
ment among raters. No cases were eliminated 
from the housing and Veterans Administration 
studies for this reason. In the case of the 
Navy study, approximately half of the cases 
were eliminated because of substantial dis- 
agreement among the raters. This greater 
divergence in the Navy Department as com- 
pared with the other groups included in the 
study can possibly be explained on the basis 
of the greater size of the Navy Department 
and the greater number of bureaus that the 
subjects worked in. ‘The fact that 6 of the 7 
correlations represent no elimination of cases 
lends further weight to the data that are 
presented. In addition, the scoring key used 
was determined in advance; in other words, the 
key that was used is not based upon item analy- 
sis on a particular group. In all cases Pearson 
product-moment correlations were used. 

Population. The first group studied were 
63 persons in personnel, budget, and organiza- 
tion analysis work in two major housing 
agencies in the Federal government. They 
are receiving salaries of between $3,000 and 
$9,400. The group at the Veterans Adminis- 
tration consisted of 42 persons in the personnel 
office of that agency receiving salaries between 
$3,000 and $6,000. These persons were en- 
gaged in all types of personnel work. The 
Navy group represents persons in several 
bureaus of the Navy Department engaged in 
various phases of personnel, office methods, and 
organization analysis work with salaries be- 
tween $3,000 and $8,400. The last group, also 
from the housing agencies, consisted of 20 
persons, with salaries of between $7,500 and 
$10,000, who are responsible for administering 
major segments of the Government’s housing 
program. 

In addition to the data obtained for the 
administrative-judgment test, validity coeffi- 
cients are presented for tests in the general 





The Administrative J udgmeni Test 


field of mental ability. For the housing groups, 
the test involved was the American Council 
on Education Psychological examination for 
college freshmen. ‘The total score on the test 
was used as the predicter. For the other 
groups, the test consisted of 25 vocabulary 
items in multiple-choice form. The reliability 
of the vocabulary test is .84 for a sample of 
42 cases with a standard deviation of 5.6, 
using Kuder-Richardson Formula No. 21. 
The purpose in presenting these data for men- 
tal-ability tests is to demonstrate that the rela- 
tive validities for this test are substantially 
lower, in general, than for the administrative- 
judgment test, despite the fact that the inter- 
correlation between the administrative-judg- 
ment test and the American Council on Educa- 
tion test is +.69 while the intercorrelation be- 
tween the administrative-judgment test and 
the vocabulary test is +.59. A hasty inspec- 
tion of these intercorrelations would lead to the 
belief that the mental-ability and administra- 


Table 1 


Product-Moment Correlations for the Administrative- 
Judgment and Mental-Ability Test with Job 
Performance and Grade Criteria 


Test of 
Adminis- Test 
trative of 
Judg- Mental 
ment Ability 
+.30 
+.38 
+.52 
+.26 
+13 
+.21 
+ .64 





Agency N Criterion 

1. Housing 63 Job Performance 
63 Grade 

2. VA 42 Job Performance 
42 Grade 

3. Navy 22 Job Performance 
46 Grade 

4. Housing 20 Job Performance 








147 


tive-judgment tests are measuring the same 
factors; actually, the validity coefficients belie 
this conclusion. The data would indicate that 
the only proper conclusion is that it is not 
necessary to use a mental-ability test along 
with the administrative-judgment test because 
the multiple correlation of these two tests 
would not be sufficiently greater to justify the 
addition of the mental-ability test. 

Additional Test Group. The latest trial of 
the administrative judgment test involved its 
administration to 30 persons being trained for 
line and staff administrative positions in the 
State Department. These persons receive 
salaries of approximately $3,000 a year. The 
criterion was the collective opinion of the 
supervisors for whom they have worked dur- 
ing their period of internship. The Pearson 
product-moment correlation for the adminis- 
trative judgment test with this criterion was 
+.60. The validity coefficient, using the same 
criterion, for the vocabulary test referred to 
above was +.23. 

Summary 


The median validity coefficient for the ad- 
ministrative-judgment test is +-.51; the median 
validity coefficient for the mental-ability test 
is +.30. Six of the seven coefficients for the 
administrative-judgment test are significant 
at the 1 per cent or 5 per cent level of confidence 
while the significance level of the coefficients 
for the mental-ability tests are in general much 
lower. 

These data are offered as a basis for further 
experimentation in other situations in order 
to determine the value of the administrative- 
judgment type of test for executive positions. 


Received September 9, 1949. 











Menstruation and Industrial Efficiency. II. Quality and 
Quantity of Production 
Anthony J. Smith 


University of Kansas 


The present paper continues the report of an 
investigation of the relationships between the 
various phases of the menstrual cycle and 
certain measures of industrial efficiency. In 
all, four criteria of industrial efficiency were 
studied. In a previous paper (5) the relations 
between menstrual function and absence rate 
and activity level were examined. 

In view of the common contention that 
accuracy of performance deteriorates during 
the premenstrual and menstrual phases, it 
would seem to be desirable to investigate this 
hypothesis in an industrial situation. To the 
knowledge of the author, such a study had not 
been undertaken previously. 

Furthermore, it is frequently assumed that 
quantity of production decreases during certain 
phases of the menstrual cycle. Several reports 
of experimental studies are available on this 
point, but the results are not consistent. 
Kirihara (3) and Meeker (7) report detrimental 
effects related to the menstrual phase, whereas 
Nowikowa (2), Gorkin and Brandis (2), and 
Anderson (1) report no differential effects. 
The area is obviously in need of further 
investigation. 


Procedure and Analytic Techniques 


Aircraft Factory. Twenty-nine women em- 
ployed in the electrical department of an air- 
craft factory were studied over a period of 


forty-one days.' All of their work underwent 
a routine inspection and complete records were 
kept including the employee’s clock number, 
the unit number, the date, and the defects 
discovered. In order to obtain data on men- 
strual function, these women were assembled 
on company time by the personnel department 
and requested to participate in an experiment 
in which the lengths of the menstrual phase 
and cycle were being studied. All of the 

' The women who served as subjects in this and the 


following factories were, with the exceptions noted, the 
same women described in the first paper (5). 


women agreed to participate and later sub- 
mitted daily menstrual data to a member of 
the physical education department. 

In analyzing the data, the menstrual cycle 
was broken down into a five day premenstrual 
phase, the menstrual phase (period of flow), a 
seven day postmenstrual phase, and an inter- 
menstrual phase. 

The effect of the menstrual cycle upon 
quality of production was determined by com- 
puting the number of error days (during which 
at least one piece was rejected) and the number 
of errorless days for all women for each of the 
four phases of the cycle. The data were then 
recorded in a two-by-four table and the signifi- 
cance of the variations tested by means of the 
chi-square test. The simpler menstrual-non- 
menstrual analysis was also performed. 

An attempt was made to derive a measure 
of quality of production that was more sensitive 
than that involving the mere presence cr 
absence of defective units. However, after 
several conferences with the supervisory staff 
it was apparent that there could be no agree- 
ment that would permit one to locate on a 
quality continuum inspection records differing 
with respect to kinds and numbers of errors. 
Hence, the cruder measure was retained. 

Parachute Factory. The final criterion to be 
investigated was quantity of production. A 
group of employees in a parachute factory was 
contacted by the company nurse and each 
woman was given an “explanation” of the ex- 
periment similar to that previously described. 
Forty-six women offered to participate and 
supplied the nurse with the necessary data on 
menstrual function. The women in this com- 
pany were paid on a piece rate system with the 
result that the personnel department main- 
tained accurate individual records including 
daily production, kind of work, and number 
of hours worked. 

For various reasons production records were 
not available for five of these subjects. Con- 


148 





Menstruation and Industrial Efficiency. 11 


sequently, the analysis was based on data from 
the remaining forty-one women. 

Examination of individual records disclosed 
that the number of hours worked per day 
varied from person to person, from shift to 
shift, and also for the same individual. This 
made it necessary to determine for each day 
the average hourly rate of production for each 
individual performing her usual work. 

It became evident that differences in effi- 
ciency among the various women and differ- 
ences in the types of units produced on the 
diverse jobs would make the analysis of these 
raw scores (hourly production rate) undesir- 
able, because variations at the many ability 
levels and among jobs would then have un- 
equal effects. To meet this difficulty com- 
parable derived scores had to be computed. 

At this point in the analysis, each individual 
was treated separately. The individual’s aver- 
age hourly rates through the total interval in- 
vestigated were examined. A measure of 
variability was obtained (standard deviation) 
and each single hourly rate was replaced by a 
standard score. Once; these standard scores 
had been derived, individual consideration was 
discontinued. 

The first analysis involved those forty-one 
subjects for whom production records were 
available. A distribution of all standard 
scores falling within the first intermenstrual 


Ad 
8 


é 


(STANDARD SCORES) 


am 


® 
2 


149 


period was made and the mean and standard 
deviation were computed. Similar distribu- 
tions were made for the subsequent phases 
(premenstrual, menstrual, postmenstrual, in- 
termenstrual, etc.). These means were then 
graphed with average standard scores for each 
phase being plotted along the ordinate while 
time measurements were recorded along the 
abscissa. (As an example, see Figure 1.) 

The matter of plotting the various elements 
of the cycles along the time axis posed a further 
problem. These phases could not be plot- 
ted at equal intervals along the axis for 
they did not represent equal periods of time. 

Consequently, each phase was represented 
by a distance that was in the same proportion 
to the total distance along the axis that the 
number of days in the given phase was to the 
total number of days studied. When the mean 
production in standard units was plotted for 
each phase of the cycle, the points were entered 
at the midpoints of their corresponding inter- 
vals along the abscissa. 

It will also be seen that there is an increase 
in mean production in succeeding time periods. 
This would indicate that a learning factor was 
operating, as might be anticipated. Improve- 
ment, whether it be small or great, would be 
reflected by an analysis involving standard 
scores. Furthermore, even assuming that 
some of the subjects have reached a plateau 


OBTAINED CURVE 
—_———— FITTED CURVE 








MEAN PRODUCTION 


INTER. | PRE.| men] POST.| InTER. | PRE. | weNW.| Post. | INTER. | 





PHASES OF 
Fi. 1. 


SUCCESSIVE 


Mean rates of production during successive menstrual phases. 


MENSTRUAL CYCLES 


Parachute factory: Ages 29-38. 











150 


while the remainder are improving to some 
extent, the composite curve would exhibit this 
learning. 

The learning factor had to be eliminated 
before other variations in performance could 
be studied. Ideally, this would be effected by 
fitting a theoretica! learning curve to the data 
and, thereafter, considering only variations 
about the curve. However, in this case no 
theoretical learning curve was available and it 
was necessary to fit a smooth curve to the data 
that would make the deviations about it a 
minimum. The curves throughout this study 
were fitted by inspection. 

From this point on, the fitted curve was 
treated as being equivalent to a zero line and 
the positive and negative deviations of the 
points about it were computed. These values 
were the new mean production measures in- 
dependent of learning, expressed in terms of 
standard scores. : 

This was followed by a combination from 
similar periods of these new scores. A com- 
posite -intermenstrual mean was derived by 
combining the means from the three intermen- 
strual periods studied. Similar composite 
means were obtained for the other components 
of the menstrual cycle. Following this, com- 
posite variances for the four phases were com- 
puted and were then tested for homogeneity 
(4). With these four composite means and 
four composite variances available, it was 
possible to make analyses of the differential 
effects of the various parts of the menstrual 
cycle by means of an analysis of variance 
technique for each situation in which the condi- 
tion of homogeneity of variances was realized. 
In each instance, the statistic used was 
Snedecor’s F (6). 

This general approach was utilized in each of 
the thirteen remaining analyses of variations in 
production. The thirteen conditions ana- 
lyzed include those to be found in Table 2, as 
well as the first shift and average mental 
difficulty groups. 

Garment Factory. 


Quantity of production 
was also investigated ata local garment factory. 
Data on rate of production were collected by 
the business manager of the union, who re- 
quested that the employees provide her with 
records of their earnings, presumably for use 
by the union in subsequent discussions with 


Anthony J. Smith 


management. This approach could conceiv- 
ably have had some effect on earning rate 
through the factor of suggestion. However, 
there was no reason to expect a differential 
menstrual effect. Special forms were prepared 
covering weekly periods but requesting daily 
information on number of hours worked, dozens 
of units completed, and ticket (daily) earnings. 

At approximately the same time, permission 
was granted by the union to approach indi- 
vidual members and solicit their participation 
in a research project. Immediately there- 
after, a small group of sixteen women was 
selected to act as potential subjects. These 
women were contacted by the author’s wife 
during their lunch hour and informed of 
the apparent “purpose” of the experiment. 
They then agreed to provide the information 
requested. 

It was decided at a conference involving the 
personnel manager, the union business agent, 
and the author that daily earnings were the 
best available measures of efficiency. The 
number of units completed was unsatisfactory 
because the women worked at different jobs 
requiring varying amounts of time for different 
units. Earning rates for each of the types of 
work had been determined by joint labor- 
management study and consequently reflected 
ability more adequately. These employees 
worked on a piece rate basis. Their work 
was inspected and they were required to re- 
work all rejected units. As a result, daily 
earnings were measures of quality and quantity 
of production. 

The analysis of the variations in earning rate 
was carried out in a manner similar to that 
employed at the parachute factory. Average 
hourly earning rates were computed and these 
were transformed to standard scores (each in ° 
terms of the subject’s own performance). 
Distributions of the standard scores of all 
subjects during each of the phases of the men- 
strual cycle were obtained. The mean scores 
for each phase were then determined and 
plotted. A curve was fitted to these points to 
eliminate the effects of learning. The devia- 
tions of the points from this curve were com- 
puted and were treated as the new mean scores. 
Means for similar phases were combined, as 
were the variances and the variances were then 
tested for homogeneity. Finally, an analysis 





Menstruation and Industrial Efficiency. 11 


Table 1 


Quality of Production: Frequency of Occurrence of Error and Non-Error Days During the 
Phases of the Menstrual Cycle 


Postmenstrual 


18 17 at 19 
(18.78)* (20.16) (24.41) 


Premenstrual Menstrual 


Intermenstrual 





Non-error Day 132 
(131.22) 


* Numbers in parentheses refer to theoretical frequencies 


of variance was performed on the derived 
composite means and composite variances. 


Results and Discussion 


Quality of Production. The analysis of the 
four phase data obtained from inspection re- 
cords (see Table 1) revealed differences that 
were not significant (P equals .20). The gross 
analysis contrasting menstrual with nonmen- 
strual days yielded much higher agreement be- 
tween theoretical and obtained frequencies (P 
equals .60). The trend was in the direction of 
fewer error days during the menstrual period. 

It was felt that some of the women might 
manifest an obvious detrimental effect during 
the menstrual phase and that such an effect 
would be obscured by group treatment. Al- 
though individual analyses could not be under- 
taken satisfactorily, an examination of indi- 
vidual records disclosed no tendency toward a 
greater proportion of error days in the men- 
strual period than during the nonmenstrual in- 
terval. Those persons who performed poorly 
during the period of flow tended to perform 
poorly at all times. 

The failure to discover a significant decrease 
in the quality of production during the men- 
strual phase might conceivably be explained on 
the grounds that accuracy was maintained at 
the expense of lowered production. A con- 
sideration of the results derived from the 
parachute and garment factories would seem 
to render this interpretation untenable. 

Quantity of Production. In checking the 
homogeneity of the variances of production 
rates during the several phases of the menstrual 
cycle, significant differences were encountered 
in only two instances. On the first shift, while 
the differences were significant, variability 


pee ae Manche 


(140.84) 


144 176 
(170.59) 


(270.31) 


was highest during the postmenstrual period 
and lowest during the premenstrual period. 
Women working on jobs described as being of 
average mental difficulty also revealed signifi- 
cant differences in variability, with variability 
being greatest during the intermenstrual phase 
and lowest during the premenstrual phase. 
This decreased variability was not accompanied 
by a decrease in production. Obviously, no 
detrimental menstrual effect is indicated. 

In the analysis of the production rates of the 
remaining testable groups, two yielded sig- 


Table 2 


Rate of Production; Values of F and Corresponding 
Probability Values 


Number 
of Days 
Analyzed F Probability 
Parachute Factory 
All Subjects 
Shift 
Second 553 
Third 567 
Age 
18-28 448 
29-38 576 
39-50 697 
Mental Difficulty 
Simple 252 
Difficult 378 
Physical Difficulty 
Average 1431 
Strenuous 280 
Standing 707 
Sitting 1004 
Garment Factory 
All Subjects 457 


1711 1.67 19 


1.88 5 


2.03* 
3.17 
1.16 
1.18 


4.660° AF 
* Variance among groups is smaller than variance 
within groups. 











152 


nificant differences (see Table 2). Women en- 
gaged in jobs involving a relatively high level 
of mental difficulty manifested significant dif- 
ferences in production rate. However, their 
lowest production occurred during the premen- 
strual phase, with highest production appearing 
in the menstrual phase and being of such a level 
as to counteract the premenstrual loss. The 
second group of women working on jobs de- 
manding strenuous physical activity displayed 
high postmenstrual production and low inter- 
menstrual production. 

In brief, variability in production on the 
jobs studied in this investigation does not 
increase during the premenstrual or menstrual 
phases as has been claimed. If anything, it 
would appear to decrease. Furthermore, rate 
of production does not seem to decrease during 
the period of flow or during a period of possible 
“premenstrual tension,” with the exception of 
the women performing work of a high level 
of mental difficulty. As a matter of fact, 
there is evidence that a drop in production 
occurs in the intermenstruum in one instance. 
It may be that both of these “significant” dif- 
ferences could have occurred by chance. How- 


ever, in the groups not yielding significant 


differences the intermenstrual period is often 
characterized by low production, whereas the 
postmenstrual period is usually one of high 
production. 


Summary 


This second part of the investigation was 
undertaken to determine the effects of the 
various phases of the menstrual cycle upon 
industrial efficiency as reflected in quality and 
rate of production. 

A total of eighty-six women in the aircraft 
and garment industries served as subjects in 
the three component studies. Approximately 
thirty-eight hundred individual working days 
were ultimately analyzed in the complete 
study. 

In each possible instance, two analyses were 
performed. In the gross analysis, menstrual 
performance was contrasted with nonmenstrual 
performance. The more intensive analysis was 
concerned with a comparison of performances 
during the premenstrual, menstrual, postmen- 
strual and intermenstrual phases. 


Anthony J. Smith 


In the study of the employees of the para- 
chute factory, the subjects besides being ex- 
amined as a group were classified into sub- 
groups according to shift, mental difficulty of 
work, physical difficulty of work, age, and 
standing vs. sitting jobs to determine possible 
differential effects of the cycle. 

The study of quality of production, measured 
in terms of the presence or absence of error 
days (days on which defective units were 
worked), shows that variations are generally 
small and unrelated to the component phases of 
the menstrual cycle. 

Significant differences in variability of pro- 
duction rate occur in two of the subgroups. In 
one case, greatest variability is found in the 
postmenstrual period, while in the other, it 
occurs in the intermenstrual period. 

All of the analyses of production rate except 
two reveal no statistically significant differ- 
ences among the phases of the menstrual cycle. 
In one group intermenstrual production is low, 
while the second group displays low premen- 
strual production which is offset by high men- 
strual production. 

No one phase of the menstrual cycle yields 
losses in efficiency more frequently than any 
other phase. Where significant differences 
occur they would appear to be the result of 
situational determinants rather than menstrual 
function. 


Received September 20, 1949. 


References 


1. Anderson, Mary. Some health aspects of putting 
women to work in war industries. In Seventh 
annual meeting, Industrial Hygiene Foundation of 
America, Inc., Nov. 10-11, 1942, 165-169. 

. Gorkin, S., and Brandis, S. Einfluss der Menstrua- 
tion auf einige psychophysiologische Funktionen 
und auf Arbeitsfihigkeit der Frau. Arbeits- 
physiologie, 1936, 9, No. 3. 

3. Kirihara, H. Functional periodicity. Rep. Inst. 
Sci. of Labour, Kurasiki, Japan, 1932, No. 14. 

. Rider, P. R. Am introduction to modern statistical 
methods. New York: John Wiley, 1939. 

. Smith, A. J. Menstruation and industrial efficiency : 
I, Absenteeism and activity level. J. appl. 
Psychol., 1950, 34, 1-5. 

. Snedecor, G. W. Statistical methods. Ames: Colle- 
giate Press, Inc., 1938. 

. Women’s Work (Anon.). Occupation and Health. 
Int. Lab. Off., Geneva, 1930-1934, No. 152. 





Cross-Validation of Clerical Aptitude Tests 


Edward N. Hay 
A plitude Test Service, Swarthmore, Pa. 


Tests have been used for fifteen years in the 
selection of clerical workers at the Pennsylvania 
Company for Banking and Trusts, Philadel- 
phia. In 1941 a study was made of the pre- 
diction of success in machine bookkeeping,’ 
using speed of posting as the criterion. A 
battery of three tests, Number Series Com- 
pletion, Name Finding, and Minnesota Num- 
bers, was found to give excellent reults in 
predicting success in bookkeeping. 

The present study was undertaken to deter- 
mine whether it was possible to predict success 
of key-punch operators with the same tests. 
Since there appeared to be no objective way 
of ranking key-punch operators according to 
production because of differences in their jobs, 
their performance was rated by their super- 
visors. The ratings were under the general 
title of “performance,” and were divided into 
six levels, with appropriate headings, as follows: 
Group I. Quantity and quality of production 


are above normal, shows initiative, capable of 
handling greater responsibilities; Groups II 
and III. Quantity and quality of production 
are entirely satisfactory, shows some ability 
in handling unusual problems, learns new 


work with average instructions. (This section 
was divided into two sub-sections.); Groups 
IV and V. Quantity and quality of production 
are just satisfactory, requires more supervision 
than the average employee. (This group was 
subdivided into two.); and Group VI. Quan- 
tity and quality of production unsatisfactory. 

In rating the clerks each one was compared 
with the others as well as rated on the charac- 
teristics indicated by the headings just given. 

The ratings of the key-punch operators were 
made by the tabulating department manager 
who supervised clerks engaged in all of the 
operations usual to Remington-Rand punch 
card tabulating equipment. These particular 
clerks were employed without experience and 
were then trained to operate the Remington- 


'Hay, E. N. Predicting success in machine book- 
keeping. J. appl. Psychol., 1943, 27, 483-493. 


Rand key-punch machine, which has a type- 
writer keyboard for letters and a special bank 
of keys for the numbers. Most of these 
operators were under 20 years of age and 
nearly all were under 25. ' Few had had office 
experience or typing training. 
The Criterion 

Eighty-two key-punch operators were rated 
who had been hired from 1941 to 1944, had 
remained on the job long enough to become 
proficient and to establish records which could 
be rated, and had taken the three tests referred 
toabove. The number of key-punch operators 
rated in each of the six groups was as follows: 
Group I. 6; Group II. 16; Group III. 31; 
Group IV. 13; Group V. 9; and Group VI. 7. 

The supervisor who did the rating was aided 
by several assistants who were group leaders. 
The range of test scores was somewhat greater 
than in normal times because the labor market 
was tight in these years, and it was frequently 
necessary to lower the hiring requirements. 

The tests used in this study were: (a) Min- 
nesota Clerical—Number Checking; (b) Num- 
ber Series Completion? and (c) Name Finding.* 

The Minnesota Clerical Test needs no de- 
scription here. The Number Series Comple- 
tion test was the form used by Guilford in the 
Nebraska Revision of Alpha and was drawn 
from several Alpha series. The Name Finding 
test is modeled on the number test of IER 
Clerical. It consists of 25 names on the front 
of a sheet; for example, Allen B. Smith. On 
the back of the sheet are groups of four names 
together: A. C. Smith, A. B. Smyth, A. B. 
Smythe, and A. B. Smith. The subject 
reads the name on the front of the sheet and 
then turns the sheet over in order to check the 
correct choice. This operation is very much 
like that performed by a bookkeeper in turning 

2 Obtainable from The Psychological Corporation, 
§22 Fifth Ave., New York. 


*Obtainable from Aptitude Test Service, Swarth- 
more, Pa. 


153 














Edward N. Hay 


Table 1 


Intercorrelations 


Key Punch 


N = 82 

(2) 
Minn. 
Nos. 


mo U 


(3) 
No. 
Series 
25 
04 


(1) Criteritn 
(2) Minn. Numbers 
(3) Number Series 


from check or invoice to the correct ledger card 
for posting. 

Test scores of the 82 key-punch operators 
were correlated with the six-step ratings and 
with each other with the results shown in 
Table 1. The Doolittle method gave a mul- 
tiple R of .380 with these three tests. This 
multiple coefficient was disappointingly low, 
and when first obtained caused the project to 
be laid aside as not affording satisfactory pre- 
diction. However, when a detailed examina- 
tion of the raw scores was made, it was apparent 
that good prediction was possible in spite of 
the low R. 


Study 


Bookkeeper Study 
N = 39 


(3) 
No. 
Series 


+6 
34 


(2) 
Minn. 
Nos. 


(4) 
Name 
Find. 

26 

33 

AB 


Find. 
48 
23 


35 


os ius 
(2) 
(3) 


Finding Cutting Scores From Scattergrams 


In order conveniently to study the scores of 
each of the three tests, scattergrams were made. 
Table 2 shows Minnesota Numbers and Table 
3 shows Name Finding. It is possible to select 
critical scores for each test from these scatter- 
grams. For example, Minnesota Numbers 
alone gives a good prediction of success. A 
score of 130 or more was made by 33, or 40% 
of the criterion group of 82 clerks, of whom 
27, or 82% of the 33 were rated “good” and 6, 
or 18% were “poor.”’ It has been assumed 
from the wording of the ratings that those in 


Table 2 


Scatter Diagram Showing Relation Between Ratings of Key Punch Operators and Scores 


on the Minnesota Numbers Test 


5 100- 105- 110- 115- 
104 109 114 119 


Scores on Minnesota Numbers Test 
120- 125- 


Rating Groups 


I. Good 1 
Il. Good 3 
Ill. Good 5 


IV. 
Vv 
VI 


Poor 
Poor 
Poor 


Number “good” 9 


Number “poor” § 1 


53D 
26 
76 


49 
21 
70 


Cumulative “good” 
Cumulative “poor” 
Cumulative total 


% Pass 
% “Good” of those 
who pass 


93 85 


66 «70 


Uncertain 19/20 


130- 135- 140- 145- 150-155-160 
134 139 144 149 154 159 plus Total 


1 1 


124 129 


2 6 
1 16 
31 


13 
9 





Success 27/6 or 4.5/1 





Cross-Validation of Clerical A plitude Tests 


Table 3 
Scatter Diagram Showing Relation Between Ratings of Key Punch Operators and Scores 


on the Name Finding Test 





Scores on 


Rating Groups 
I. Good 
Il. Good 
Ili. Good 
IV. Poor 
V. Poor 
VI. Poor 


Number “good” 0 
Number “poor” 1 


Cumulative “good” 53 
Cumulative “poor” 29 
Cumulative total 82 
% Pass 


% “Good” of those 
who pass 


100 


65 





Failure 1/8 


| 


the first three rating groups were the “good” 
clerks and those in the last three groups were 
the “poor” clerks. There are 53 of the former 
and 29 of the latter. Of the clerks who scored 
less than 130 on Minnesota Numbers, there 
are about as many poorer ones as better ones. 
Accordingly, the scores below 130 on this test 
fall in “uncertain” territory. Above that point 
the ratio of better to poorer is 44 to 1. 

In’ the case of the Name Finding test, as 
shown in Table 3, the ratio of better to poorer 
(who score above 12) is not so high —2.8 to 1. 
However, except for “uncertainty” with a score 
of 12, the prediction from scores of 11 and less 
is strongly toward ‘‘failure,” or poorer clerks, 
the ratio being 8 to 1 for failure. Another 
difference between the tests is seen in the 
critical percentages. The ‘“‘passing’” group 
with Minnesota Numbers (making 130 or more) 
comprises only 40% of the criterion group, of 
which 82% are good and 18% poor, whereas 
79% of the criterion group exceeded the critical 
score on Name Finding and 74% of them were 
the higher rated clerks against 26% lower 
rated. The coefficients of correlation with the 


SPE SEE eres te £8 Feet: oe 


8 10 Ny) ys aS 6 7 


| 4/4 
| Doubt- 


ful 


Name Finding Test bet bas Ci ibe 
18 21 Total 
1 ee 6 
4 2 16 
6 1 31 


1 13 
9 


~ 
‘ 


“a SO 


Success 48/17 or 2.8/1 


rating criterion are .302 and .260, respectively, 
for Minnesota Numbers and Name Finding. 
These examples show how easy it is to find the 
best cutting scores for single tests even though 
the correlations are low. 


Multiple Cutting Scores 


A study of the scattergrams suggested that a 
combination of critical scores from two or three 
tests might be even more effective in discrimi- 
nating between “‘good” and ‘“‘poor”’ key-punch 
operators than the cutting score from just 
one test. The object was to find a combination 
of scores that were passed by a large percentage 
of the clerks rated “‘good,”’ and that will elimi- 
nate as many as possible of those rated ‘‘poor.”’ 
In order to do this, it was decided to list the 
scores for individuals by rating groups. Table 
4 shows the scores of the applicants who were 
rated in the second and fifth of the six rating 
groups. The rest of the list is omitted to 
save space. 

Table 5 shows the results of trying different 
combinations of cutting scores. The combina- 





+ DR NCES 


cermenctcsk Unb 











156 


tions listed here are but a fraction of the many 
trials that would theoretically be necessary to 
exhaust all the possibilities of combinations of 
cutting scores. But from a study of the list 
(given in part in Table 4) and the scattergrams, 
these seemed to be the only combinations that 
offered any promise. In Table 5 the critical 
score combinations are arranged in descending 
order, assuming that hiring standards would 


Edward N. Hay 


be lowered successively as the labor market 
tightened. Combinations of test scores for 
this group give better results at all score levels 
than the multiple regression equation possibly 
because the distributions are not linear. The 
best combinations for different levels of hiring 
are shown in Table 6. 

In studying the list of scores, it was found 
that there were several individuals in Group 


Table 4 
Scores on Tests Taken by Key Punch Operators at Time of Application for Employment 


Note: Scores are given for individuals in two of the six rating groups. The scores listed at the top of the 
5 columns at the right are some of the combinations of cutting scores on indicated tests which were selected 


for trial, 


An X was placed after the scores of each individual who failed to “pass” the particular combina- 


tion of cutting scores. By counting these X’s in a given column, it is possible to show the number of good 


and poor clerks who would have been eliminated if hiring had been based on these standards. 
binations are representative of the different combinations which were tried. 


(1) Minnesota Numbers 


(2) Number Series 
(3) Name Finding 
Group I 
No. 1 
No. 2 
No. 3 
No. 4 
No. § 
No. 6 
No. 7 
No. 8 
No. 9 
No. 10 
No. ll 
No. 12 
No. 13 
No. 14 
No. 15 
No. 16 
Group V 
No. 1 146 
No. 2 118 
No. 3 118 
No. 4 112 
No. § 140 
No. 6 107 
No. 7 125 
No, 8 117 
No. 9 127 
Number passed in Group II 


SexSnunu-~ 


— 
sw 


Number passed in Group V 


These 5 com- 


110 


110 





Cross-Validation of Clerical Aptitude Tests 


Table 5 
Possible Selections of Key Punch Operators from Test 
Scores, Using Indicated Cutting Scores on the Given 
Tests and Showing the Per Cent Who Pass and the 
Per Cent Rated “Good” of Those Who Pass. 


Note: N is 82, with 53 rated “good” and 29 rated “poor”. 








Per Cent 
Rated 
“G s 
of Those 
Who Pass 


9 13 59 86 
8 12 73 78 
9 70 79 

70 79 
9 68 

78 

79 

77 

89 

85 


Per Cent 
Who 
Pass 


Minn. No. Name 


Series Finding 





VI who made high scores on all three tests. 
Some of these were found to be “problem 
cases.” Of one it was said, “Hated the work; 
was transferred to credit analysis.” Of an- 
other, “Hired as an experiment; had only one 
arm. Left because of a nervous condition.” 
Of still another, “Had ability but refused to 
cooperate. Poor health and attitude.” These 
individuals were, of course, left in the sample. 


157 


These comments help to explain, however, why 
it is never possible to get 100 percent success in 
selecting key-punch operators, or any other 
workers, on the basis of test results alone. 


Comparing Results with Previous Study 


It is interesting to compare the results of the 
present study with the earlier study of 39 
bookkeepers in the same organization. Since: 
the same clerical aptitude tests were used, the 
value of using multiple cutting scores can be 
studied in this sample also. In the bookkeeper 
group, slightly better prediction was obtained 
by means of the multiple regression formula 
than with muitiple cutting scores. The R in 
this case was .70 as found by the Doolittle 
method, the tests which contributed being: 
(1) Minnesota Numbers, (2) Number Series 
Completion, and (3) Name Finding. The in- 
tercorrelations are given in Table 1. Here the 
criterion was actual rate of production, usually 
a more reliable criterion than ratings of super- 
visors. An examination of the results secured 
with multiple cutting scores in the bookkeeper 
group confirms the satisfactory prediction at- 
tained in the key punch group. Table 6 lists 
the successful predictions from the same critical 
scores shown in Table 4 when applied to the 
39 bookkeepers. The dividing line between 
“good” and “poor” clerks was taken at the 


Table 6 


Prediction Results for the Key Punch Group and the Bookkeeper Group Using Indicated 
Cutting Scores on the Three Tests 
Note: 82 Key Punch Operators, 53 rated “good” and 29 rated “poor”; 39 Bookkeepers, 29 “good” and 10 


Name 
Finding 


Number 
Series 


Minn. 
Nos. 


Key Punch Group 


Per Cent 
Who Pass 


Bookkeeper Group 


Per Cent 
Rated “Good” 
of Those 
Who Pass ' 


Per Cent 
Rated “Good” 
of Those 
Who Pass 


Per Cent 
Who Pass 





110 9 59 
110 


86 54 86 ee 
79 79 81 

83 

86 


79 
84 
73 














158 


production rate of 105, which was the produc- 
tion average of the group at the time the study 
began in 1937; 29 operators achieved 105 or 
better and 10 operators achieved less than 105. 
In Tables 1 and 6 the results of both studies 
are given for comparison and to show how one 
study confirms the other. 

In view of the way in which the results of 
the bookkeeper group confirm those of the 
key-punch operators it is interesting to note 
that the subjects were quite different in several 
respects. Although the key punch operators 
were mostly inexperienced at the time of hiring, 
the bookkeepers were almost all experienced 
clerks. . Their average length of service was 9 
years and 2 months. The bookkeeper study 
took place in the years 1937-1940 and the key 
punch study after 1944. One group operated 
the Burroughs adding-bookkeeping machine, 
and the other was trained to use the Reming- 
ton-Rand key-punch machine. 

Comparison of the two prediction formulae 
shows the differences in the proportionate con- 
tributions of the three tests: 


Edward N. Hay 


Bookkeepers: 
X, = .19 X Minn. Nos. + 1.34 X No. 
Series + 1.27 X Name Finding + 52.4. 
Key Punch operators: 


X, = .15 X Minn. Nos. + .97 X No. 
Series + .35 X Name Finding + 67.3. 


Summary 


A study was made of prediction of success 
in a group of 82 key punch operators on the 
basis of a battery of three clerical tests. Al- 
though the Doolittle multiple R was low, it 
was possible to make satisfactory predictions 
on the basis of a combination of cutting scores 
on the three tests. 

The results of this study corroborate the 
findings of the bookkeeper study made in 1941. 
The tests used were Minnesota Numbers, 
Number Series Completion, and Name Finding. 
They require only fourteen minutes of testing 
time and predict equally well for experienced 
and inexperienced clerks. 

Received April 12, 1950. 

Early publication. 





A Test Battery for Actuarial Clerks * 


Adam Poruben, Jr. 
Personnd Division, M eropolitan Life Insurance Company 


In June, 1947, the writer was requested to 


validate a test battery for the selection of. 


Actuarial Clerks in the Metropolitan Life In- 
surance Company. At this time, a review of 
the job description of the Acturial Clerk, in 
cooperation with an Assistant Actuary, re- 
vealed at least three characteristics necessary 
for the performance of this job, namely, mental 
alertness, numerical aptitude, and memory. 
Because of practical considerations, only a 
small sample of 12 Actuarial Clerks was avail- 
able at this time. Six tests, designed to meas- 
ure the above characteristics, were adminis- 
tered to this group. The rank correlations 
between test scores and overall performance 
on the job indicated that five of these tests 
might have some value for the prediction of 
success on the Actuarial Clerk job. The rank 
correlations for the individual tests were fairly 
low but when the raw scores were transformed 


into standard scores and a composite score was 
found for each individual, a rank correlation 
of +.71 was obtained between the composite 
scores and overall performance on the job. 
In spite of the small sample, the writer and 
management decided to use these tests tenta- 
tively until it would be possible to validate the 


tests on a larger sample. The purpose of 
this paper is to report the results of such 
a study. 


The Sample 


The sample consisted of 125 Actuarial Clerks 
who had taken the five tests used in the original 
study. The majority of these Actuarial Clerks 
were called either Computing or Calculating 
Clerks. All of these employees had at least 
six months of service on the particular job 
they were doing at the time of this study. The 
average length of service on the job for the 
entire group was 13.5 months with an S.D. of 
7.65 months. The average length of service 

* Grateful acknowledgment is made to Mr. T. A. 


Crowther, Mr. V. G. Christman, Mr. V. A. Lane, and 
Mr. H. L. Rhoades for their assistance in this study. 


with the company for the entire group was 9.5 
years with an S.D. of 8.10 years. 

The Home Office Clerical Jobs of the Metro- 
politan Life Insurance Company are at the 
present time classified into 19 levels, ranging 
from level A to level S. in order to give the 
reader an idea of the complexity and respon- 
sibility of the various jobs represented in the 
above sample, the job level distribution of 
these 125 jobs is given. It is as follows: 15 B’s, 
14 C’s, 25 D’s, 18 E’s, 17 F’s, 16 G’s, 9 H’s, 
3 I’s, and 8 J’s. 


Description of the Tests 


1. Otis Self-Administering Test of Mental 
Ability. This test is well known and need not 
be described in detail. The 30-minute time 
limit was used. The author reports a test- 
retest reliability of .92. 

2. L.O.-M.A. 4-M Test. This test is pub- 
lished by the Life Office Management Associa- 
tion and consists of 39 problems: 11 percent- 
ages, 10 fractions, 7 decimals, and 11 numerical 
reasoning problems. It has a time limit of 
one hour. The corrected odd-even reliability 
of this test for the 125 sample used in this study 
was +:91. A test-retest reliability coefb- 
cient of +.86 was also obtained for this test 
on a small sample of 24 company employees. 

3. Ratio- Proportion Test. This test was con- 
structed by the writer and ‘consists of 11 
problems involving the use of ratio and pro- 
portion. Three of the problems call for simple 
interpolation of tables since considerable inter- 
polation of tables is done on most of the 
Actuarial Clerk jobs represented in this study. 
It has a time limit of 27 minutes. Its corrected 
odd-even reliability was found to be +.77 for 
the 125-case sample of this study. 

4. Logical Memory Test. This test consists 
of a short story written in a single paragraph. 
The various ideas in this story are separated by 
diagonal lines to facilitate memorization. The 
subject reads this story for two minutes and 


159 











100 


then reproduces from memory as much of the 
story as he can. The score is the number of 
ideas correctly reproduced. No estimate of 
reliability was possible for this test. 

5. Wesman Personnel Classification Test. 
Only Part II of this test was used. This con- 
sists of twenty numerical problems. It has a 
time limit of 10 minutes. Its corrected odd- 
even reliability for the 125-case sample studied 
was +.85. The author reports a reliability 
coefficient of .82 for 174 chain store clerks. 


The Criterion 

The criterion consisted of ratings obtained 
on an experimental rating form. This form 
consists of eight traits: Knowledge of Work, 
Quality of Work, Quantity of Work, Ability to 
Learn, Cooperation, Interest in Work, Atten- 
dance, and Punctuality. Ratings on the first 
six traits and an over-all rating based on all 
traits were used as criteria in this study. The 


trait ratings are obtained on a five-~legree 
graphic scale. The scales on Knowledge of 
Work, Quality of Work, and Quantity of 
Work have a range of 1 to 20; those for Ability 
to Learn, Cooperation, and Interest a range of 
1 to 10; those for Attendance and Punctuality 


a range of 0 to 5. The over-all rating is ob- 
tained by summing the point ratings on all of 
the eight traits. 

This experimental rating form was designed 
by the Company in 1948 in order to improve 
the employee-evaluation procedures. It was 
tried out on 3,275 non-supervisory clerical em- 
ployees in September 1948 and on 8,876 non- 
supervisory clerical employees in February 
1949. The results were quite satisfactory. 
The mean and S.D. of the September 1948 over- 
all ratings was found to be 69.8 and 11.16, re- 
spectively; the mean and §.D. of the over-all 
ratings on the 8,876 sample was found to be 
69.5 and 11.26, respectively. Both of these 
experimental runs were first explained to the 
Company managers by the Personnel Officer 
who urged them to give the best and most 
unbiased ratings since no officialfrecords were 
to be kept and no administrative actions, such 
as salary adjustments, were to be taken on the 
basis of these ratings. Also the managers were 
urged to rate only employees with at least six 
months experience and to disregard any ratings 
which they have given any employee at any 


Adam Poruben, Jr. 


time before these experimental runs. Also the 
managers had no knowledge of the fact that a 
second experimental run would be made at the 
time of the first run. Therefore, it is safe to 
say that the two ratings on the 3,275 employees 
were made independent of each other. 

As was mentioned before, two ratings were 
available for 3,275 employees. From this 
group, 2,210 were selected who were still on 
the same job at the time of the second rating 
as the first rating. These 2,210 employees, 
therefore, had at least six months experience 
on their job at the time of the first rating and 
at least one year at the time of the second 
rating. Most of the jobs held by these 2,210 
employees are such that the employee can be 
trained to do the work in a few months. 
Thus, at the end of six months the employee 
should have attained enough skill on his or 
her particular job that fairly accurate evalua- 
tion of his or her work is possible. When the 
two sets of over-all ratings for these 2,210 em- 
employees were correlated a coefficient of re- 
liability of +-.69 was found. 

The means and S.D.’s of these two sets of 
ratings for the 2,210 employees were not sta- 
tistically significantly different. The first 
ratings had a mean of 69.69 and an S.D. of 
10.97; the second ratings had a mean of 70.35 
and an S.D. of 11.08. 

For obtaining estimates of reliability of the 
trait ratings, the ratings of the employees from 
the Actuarial Division who were rated twice 
were used. It was found that there were 171 
such employees who were rated twice, who 
were on the same job at the time of the second 
rating as the first, and who had at least six 
months experience on their job at the time of 
the first ratings. The rate-rerate coefficients 
of reliability were found to be as follows: 
+.64 for quality of work, +.65 for quantity 
of work, and +.71 for ability to learn. 


Results 


In order to see if the tests differentiated be- 
tween the outstanding and the poor workers, 
the test scores of the thirty-two (25.6%) em- 
ployees with the highest over-all ratings were 
compared with the test scores of the thirty 
(24.0%) employees with the lowest over-all 
ratings. The results are shown in Table 1. 

Distributions for all variables were drawn 





A Test Battery for Actuarial Clerks 


Table 1 
Average Test Scores and Ratings for the Best and Poorest Employees 











Upper Group 
S.D. 


Variable 


Lower Group 
S.D. 





Ratio- 

L.O.-M.A. 4-M 
Personne! Classification 
Otis Mental Ability 
Logical Memory 
Over-all Ratings 


2.74 
9.38 
3.95 
11.57 
9.22 
3.18 


2.40 
771 
3.64 

11.09 

10.73 





* Significant at the one per cent level. 
** Significant at the five per cent level. 


» and found to be approximately normal. The 
scatter plots between the variables indicated 
that the use of the Pearsonian product-moment 
correlations was permissible. The means, 
S.D.’s, test intercorrelations and validity coeffi- 
cients are shown in Table 2. 

The correlations between the tests and rat- 
ings on Cooperation, Interest in Work, and 
Knowledge of Work were not significant at the 
one per cent level and, therefore, are not shown 
in Table 2. 

Two more correlations were found which do 
not appear in Table 2. One of these was be- 
tween length of service with company and 
over-all ratings. It was found to be .12. The 
other was between over-all ratings and the 
job level. This was found to be .32. 

The Wherry-Doolittle technique was used 
to combine the tests into batteries. All the 
test-criterioa correlations were corrected for 
attenuation in the criterion before this tech- 
nique was applied. The only combination that 
gave a higher validity coefficient than the 


single tests was the combination of the Ratio- 
Proportion and Wesman Personnel Classifica- 
tion Tests which resulted in a shrunken mul- 
tiple R of .457 and Beta weights of .333 and 
159, respectively, with the criterion Ability 
to Learn. Apparently there is considerable 
overlap among the tests. 
Discussion 

It can be seen from Tables 1 and 2 and the 
results given in the last section that the Ratio- 
Proportion Test is the best predictor since it 
correlated significantly with ratings on Quality 
of Work, Ability to Learn, and with over-all 
ratings. The L.O.-M.A. 4-M came out about 
the second best in that it was found to correlate 
significantly with the ratings on Quality of 
Work and Ability to Learn. The Wesman 
Personnel Classification Test also correlated 
significantly with two of the criteria of job 
success—Quality of Work and Ability to 
Learn. The Otis was found to correlate sig- 
nificantly with only one of the criteria, namely, 


Table 2 


Test Intercorrelations and Validity 


S.D. 





Mean Description of Variable 


Coefficients for 125 Actuarial Clerks * 


B Cc D 





11.4 
52.0 


3.78 
11.57 
45.6 9.99 
19.9 8.61 

6.9 2.70 
71.3 9.89 
14.1 2.57 
13.7 2.34 

7.3 1.21 


Personnel Classification A 
Otis Test of Mental Ability B 
Logical Memory Cc 
L.O.-M.A. 4-M (Arithmetic) D 
Ratio-Proportion E 
Total Rating F 
Quality of Work G 
Quantity of Work H 
Ability to Learn I 


m 31 82 


56 18 
36 





* A correlation of .23 is significant at the 1% level. 











162 


Ability to Learn. The Logical Memory Test 
. Showed no significant correlations with any of 
| the seven criteria used in this study. 

In view of the above results, it can be safely 
concluded that the Ratio-Proportion, the 
L.O.-M.A. 4-M and the Wesman Personnel 
Classification Tests are valid for the prediction 
of success on the Actuarial Clerk job. Al- 
though the validity coefficients are not high, 
it can be stated that only a relatively small 
percentage of the employees tested, those with 
the higher test scores, are considered for selec- 
tion and placement after their other personnel 
records such as years of service, attendance, 
former ratings, etc. are reviewed. Under these 
conditions even relatively small validity coeffi- 
cients have some predictive value. 

It is not surprising that the tests did not 
show any validity for the prediction of behavior 
included under the trait Cooperation nor for 
Interest in Work. The tests were designed 
to measure aptitudes and not personality 
characteristics or interest. The results bear 
this out. 

The analysis in connection with the multiple 
correlation work indicated that there is con- 
siderable overlap among the tests. It appears 


that a two-test combination consisting of the 


Ratio-Proportion and the Wesman Personnel 
Classification Tests has as good or better 
validity than all five tests combined. The 
Logical Memory Test definitely does not add 
to the efficiency of prediction. This is prob- 
ably because of its poor reliability due to 
subjective scoring. Also it appears that what- 
ever is measured by the Otis is just as well 
measured by the L.O.-M.A. 4-M, Ratio-Pro- 
portion and the Wesman Personnel Classifi- 
cation Tests. This is not too surprising since 
fairly large portions of these tests, especially 
the first two, consist of numerical reasoning 
problems. 
Summary 

A battery of five tests, designed to measure 
mental alertness, numerical aptitude and 
memory, was tried out on a sample of 125 
Actuarial Clerks in the Metropolitan Life In- 
surance Company. All of these employees had 
at least six months of service on the particular 
job they were doing at the time of this study, 
the average length of service on the job for the 
entire group being 13.5 months with an S.D. 


Adam Poruben, Jr. 


of 7.65 months. The average length of service 
with the Company for the entire group was 
9.5 years with an S.D. of 8.10 years. Their 
job levels ranged from B to J. 

The criterion consisted of ratings on six 
traits and an over-all rating. The reliability 
of the over-all ratings was estimated by ob- 
taining a correlation between two ratings on 
2,210 employees, the two ratings being five 
months apart. All of these employees were 
on the same job during both ratings, had at 
least six months experience on their job at the 
time of the first rating and were rated by the 
same supervisor at both times. This correla- 
tion was found to be +.69. Another measure 
of the stability of the over-all ratings was ob- 
tained by finding the means and S.D.’s of the 
two sets of ratings. The first ratings had a 
mean of 69.7 and an S.D. of 10.97; the second 
ratings had a mean of 70.4 and an S.D. of 11.08. 
The reliability coefficients for the traits on a 
sample of 171 employees, ranged from +-.64 
to +.71. 

Two of the tests were found to differentiate 
significantly between the best 25 per cent 
and the worst 25 per cent of the employees. 
These were the Ratio-Proportion and the 
L.O.-M.A. 4M Tests. The Wesman Personnel 
Classification Test differentiated significantly 
between these two groups at the five per cent 
level. 

The Ratio-Proportion Test was found to be 
the most valid, having significant correlations, 
at the one per cent level, with three of the seven 
criteria of job success, namely, Quality of 
Work, Ability to Learn and over-all ratings. 
The L.O.-M.A. 4-M was the second best pre- 
dictor having significant correlations with 
Quality of Work, and Ability to Learn. The 
Wesman Personnel Classification Test also cor- 
related significantly with two of the seven 
criteria, namely, Quality of Work and Ability 
to Learn. 

Great overlap was found to exist among the 
five tests. On the basis of the multiple cor- 
relation results it appears that the combination 
consisting of the Ratio-Proportion and the 
Wesman Personnel Classification Tests has as 
good or better validity than all five tests 
combined. 


Received March 2, 1950. 
Early publication. 





Changes in Subjective Fatigue and Readiness for Work 
During the Eight-Hour Shift * 


John W. Griffith, Willard A. Kerr, Thomas B. Mayo, Jr. 
IMinois [nstitute of Technology 


and , 
John R. Topal 


Belden Manufacturing Company 


Although remarkable progress has been made 
in industrial psychology in recent decades, it 
is an interesting fact that remarkably little 
research has been reported on the changes 
which probably occur in the subjective fatigue 
and readiness for work of personnel from one 
part of a standard work shift to another. As 
Ryan (5) has pointed out, most “fatigue’”’ re- 
search has reflected an academic preoccupation 
either with trying to measure objective fatigue 
or attempting to define fatigue with precision 
the latter task being one which Muscio (3) 
decided as early as 1921 to be practically 
hopeless. Because of the relative absence of 
literature on the subjective feeling changes 
with work, the problem is largely unmentioned 
in existing textbooks on personnel and indus- 
trial psychology. 

Production curves for the work day in 
various factory operations sometimes are pre- 
sented for their possible relevance (1, 4) to 
fatigue, but it is admitted generally that many 
factors other than fatigue, however defined, 
determine the production curve. Feelings of 
tiredness do not necessarily change in expected 
directions with changes in rate of output. 
However, to date, the scheduling of work and 
locating of rest pauses in industry have been 
done largely according to guesswork or with 
reference to incidental practical considerations 
independent of worker readiness. 

If considerable consistency is found in worker 
feelings of tiredness and readiness to work in 
various types of work, it is possible that such 
feeling curves will be useful in the more in- 
telligent scheduling of work and rest pauses 
in business and industry. 


* Senior author is W. A. Kerr. 


The Present Research 


The present research entertains the hypothe- 
sis that employees in representative types of 
work possess definite attitudes as to when 
during the work shift they are most ready to 
work and when they are most tired. Em- 
ploying the “tear ballot” technique (2) a 
measuring device was constructed which ob- 
tains from the worker his estimate of when in 
each half of the eight-hour shift he feels most 
rested and most tired. All replies of 379 em- 
ployees were anonymous except that eleven 
ballots were temporarily coded for a crude 
test-retest reliability check in addition to other 
internal consistency evidence. Included in 
the sample were 232 male manual workers 
(handlers and sorters of light-to-100-pounds 
materials). 75 foremen in a rawhide factory, 
and 72 office workers (48 male, 24 female). 
Foremen were measured at a regular meeting 
of the Chicago Rawhide Management Club 
and the other personnel were measured while 
at work in a distributing organization (manual 
workers) and in an electronics plant (office 
workers). The supervisory and office per- 
sonnel were regular day shift employees but 
the 232 manual workers began their various 
shifts in the three-hour period from 3:30 to 
6:30 p.m 


Results 


Obtained subjective reports of tiredness and 
readiness for work at various hours of work 
were analyzed with respect to age, sex, and 
type of work performed. Repeat-test relia- 
bility coefficients for small groups ranged from 
69 to .92. 

Age. Office 


163 


and manual workers were 











164 


studied separately as to possible age differences 
in work feelings. Each group was divided 
into younger (20-35) and older (36-65) per- 
sonnel and per cent of each age group feeling 
tired (and rested) at each of the eight hours of 
the shift was calculated. Considering each 
half of the work shift separately, normal 
chance expectancy, assuming no change in 
work feelings with successive hours of work, 
would yield 25 per cent response at each hour 
of each half of the work shift. Actually, con- 
spicuous changes in both tiredness and readi- 
ness for work are reported by both older and 
younger workers in successive hours of work. 
Older workers, both office and manual, report 
greater average feeling deviations from chance 
expectancy then do younger workers. This 
tendency, shown clearly in Table 1, seems to 
indicate that older workers are introspectively 
more conscious of feelings of tiredness and 
readiness for work at specific hours of the work 
shift than are younger workers. It is possible, 
of course, that “objective” fatigue is equally’ 
present in the younger workers but that the 
younger workers are less affected in their sub- 
jective feelings by their continually changing 
organic conditions than are older workers. 
Another explanatory hypothesis is that younger 
workers simply have less insight into their 
feeling changes with successive hours of work. 


J. W. Griffith,W. A. Kerr, T. B. Mayo, Jr., and J. R. Topal 


Whatever the most tenable explanation, older 
workers in this study report significantly 
greater extremes of work feelings than do their 
younger associates. 

Sex. Since all the manual workers are male, 
the sex comparison is limited to office personnel 
--48 males and 24 females—groups too small 
for any except suggestive comparisons. A 
suggestive tendency is present for female em- 
ployees to report greater extremes of tiredness 
than do males. 

Type of Work. Curves of work feeling 
throughout the work spell for manual work, 
office work, and supervising are displayed in 
Figures 1 and 2. It is significant that these 
three curves in each graph are all highly 
similar, despite the fact that they are derived 
from reports of employees doing dissimilar 
types of work and in different firms and in- 
dustries. Introspectively, apparently, workers 
of widely differing types experience substan- 
tially the same feelings of tiredness and readi- 
ness for work at specific periods in the work 
spell. The similarity of these curves is all the 
more striking when it is considered that the 
manual workers are “‘swing’’ shift rather than 
regular day shift personnel. Extent of sub- 
jective tiredness feelings appears from Figure 
2 to be in part a function of degree of manual 
effort involved in jobs performed. Supervisors 


Table 1 


Per Cent of Each Age Group Among Manual and Office Workers Reporting Feelings of “Most Tired” and 
“Most Rested” at Each Hour of the Work Shift and the Mean Deviations 
of Such Reports from Normal Expectancy 








Per Cent “Most Tired”’ 


” Mseual Office 





Hour of 
Work Age Age Age Age 
Shift 20-35 36-65 20-35 36-65 


Per Cent “Most Rested” 
Manual Office 


Age Age ; 
20-35 36-65 


Age Age 
20-35 36-65 





1 32 27 41 
5 3 18 

16 18 0 
47 52 41 
100 100 
23 12 
15 12 
25 35 
37 41 
100 100 
10.3 14.5 
4 17 


27 
36 S183 
20 ee 
17 3 6 
100 100 100 
28 230 
35 4282 
24 24 8612 
13 9 12 
100 100 100 
6.5 123 163 
84 ss 17 


31 6 





Subjective Fatigue and Readiness for Work 


MANUAL WORKERS 
OFFICE WORKERS =~ -—-— 
SUPERVISORS te eee 


PERCE NT 








2 3 4 5 6 7 8 
HOUR OF SHIFT 


Per cent of manual, office, and supervisory employees reporting maximal feelings of restfulness 
at each hour of each half of the eight-hour work shift. 


MANUAL WORKERS 
OFFICE WORKERS ---—--— 
SUPERVISORS ° 








PERCENT 








4 5 & 
HOUR OF SHIFT 
Fic. 2. 


Per cent of manual, office, and supervisory employees reporting maximal feelings of tiredness 
at each hour of each half of the eight-hour work shift. 








166 


show minimal variation about the line (25 per 
cent) of chance expectancy while manual 
workers show maximal variation for the three 
groups studied. Period of maximal tiredness 
seems to be the hour preceding the lunch 
period, while another peak of tiredness is 
during the last hour of the work shift. Readi- 
ness for work in terms of per cent of personnel 
reporting themselves as “most rested” is 
maximum in the second hour after the be- 
ginning of each work spell and it is minimal 
in the last hour of each work spell. 


Summary 

Manual, office, and supervisory employees 
totalling 379 from three different establish- 
ments were measured with a Kerr “tear ballot” 
for subjective feelings of tiredness and restful- 
ness in the various hours of the eight-hour 
work shift. 

1. Manual, office, and supervisory personnel 
report significantly differential feelings of tired- 
ness or restfulness for various periods in the 
work shift. 

2, Older workers report significantly greater 
variation of such feelings than do employees 
under age 36, 

3. Curves of tiredness feeling and restfulness 
feeling throughout the work shift are remark- 
ably similar for the manual, office, and super- 
visory employees in this study. The similari- 
ties are more impressive than the dissimilarities. 


J. W. Griffith,W. A. Kerr, T. B. Mayo, Jr., and J. R. Topal 


4. Maximal subjective fatigue is reported 
in the fourth and eighth hours of the eight- 
hour shift. 

5. Maximal restfulness feeling is reported 
in the second and sixth hours of the shift, the 
second hour of each four-hour work spell. 

6. In possible future evaluation of the psy- 
chological and efficiency advisability of the 
six-hour day, it is recommended that such data 
as these reported here be obtained on em- 
ployees now engaged in six-hour shifts. Such 
new data should be examined particularly for: 
(a) less variability of tiredness feeling response ; 
and (b) relative absence of high tiredness peaks 
just before the middle and end of the work shift. 


Received September 16, 1949. 


References 


1. Goldmark, M. D., Hopkins, P. S. F., and Lee, F. S. 
Studies in industrial physiology: fatigue in rela- 
tion to working capacity : comparison of an eight- 
hour plant and a ten-hour plant. U.S. Public 
Health Service, Public Health Bulletin No. 106, 
1920. 

Kerr, W. A. Where they like to work; work place 
preference of 228 electrical workers in terms of 
music. J. appl. Psychol., 1943, 27, 438-442. 

Muscio, B. Isa fatigue test possible? British J. of 
Psychol., 1921, 12, 31-46. 

. Rothe, H. F. Output rates among butter wrappers: 
I. Work curves and their stability. J. appl. 
Psychol., 1946, 30, 199-211. 

. Ryan, T. A. Work and effort. New York: Ronald 
Press Co., 1947. 





Accident Proneness of Factory Departments * 


Willard A. Kerr 
Illinois Institute of Technology 


The extent to which individual accident 
proneness exists or has been a determinant of 
physical casualties in industry plainly has been 
exaggerated by many earlier authoritative 
writers according to more recent evidence (3, 
7). Much factory data which appear at first 
examination to indicate that certain employees 
are persistent “repeaters” and therefore “acci- 
dent prone” fail to substantiate such conclusion 
upon detailed probability study. Accidents 
distributed by chance (under the theory that a 
certain approximate number are inevitable 
under the existing total work situation in a 
factory department) will supply some workers 
with no accidents, some with one, some with 
two, and a few with even three or more (7). 
Because such analysis actually does succeed 
in most factory experience in explaining much 
of the individual employee “repeat” accidents 
data, the time-honored approach of the psy- 
chologist and psychiatrist (4) which emphasizes 
identification of subtle personality conditions 
which predispose to accidents by some em- 
ployees seems to be a less promising approach 
than that which emphasizes study of the total 
psychological climate in which the typical em- 
ployee of a group works. If proneness (or lia- 
bility) to accidents exists such tendency may be 
a group psychological phenomenon as well as 
an individual psychological phenomenon. 

The fact that intelligent safety engineers and 
industrial training personnel working with in- 
dividuals and equipment often are unable to 
take some factory departments out of the 
“accident prone” column even after years of 
intense effort is proof that many group (as well 
as individual) psychological conditions may be 
operating. 

The Present Study 


Subjects for this study were 53 accident 
prone and non-accident prone departments in 


* Acknowledgment of invaluable advice and assistance 
in obtaining the accident data is made to O. C. Boileau, 
Safety Department, Radio Corporation of America, 
and to Dean F. H. Kirkpatrick, Bethany College, for 
constructive criticism. 


167 


the Camden Works of RCA involving 12,060 
employees. ‘These data were collected in 1943, 
Forty other variables were investigated in each 
department. 

Accidents per hundred workers per year for 
these departments ranged from 0.0 to 22.7, 
although 38 of the departments had rates of 
less than four accidents per 100 workers. 
Severity ratings, based largely on days lost 
from work, also were obtained for each depart- 
ment with the advice of the plant safety 
director. These severity values ranged from 
0 to 75. 

It is only because of the grave importance of 
the objective that such unpromising potential 
correlates of accidents as some of these reported 
in this study were investigated. Of the forty 
variables studied only a few were significantly 
related to accidents, as expected, yet at' least 
two of these results have not been reported 
previously in accident literature; therefore, 
they may justify the entire investigation. 

Because both accident variable distributions 
were positively skewed and several of the 
variables studied consisted of dichotomous or 
two-interval data, the tetrachoric coefficient 
of correlation (2) was employed. The statis- 
tically significant correlations (five per cent 
level) in Table 1 are indicated according to 
use of Kelley’s reliability formula (6) and the 
Guilford-Lyons tables (5). 

Inspection of these significant correlations 
reveals that accidents tend to occur with 
greatest frequency in those factory depart- 
ments with lowest intra-company transfer 
mobility rates, smallest per cent of employees 
who are female and on salary, least promotion 
probability for typical employee, and highest 
mean noise level. 

While departments highest in accident fre- 
quency usually also are above average in acci- 
dent severity, the severe accident departments 
have some systematic characteristics which are 
found less often in the high accident depart- 
ments. High severity departments are heavily 








Willard A, Kerr 


Table 1 


Correlations between the Frequency and Severity of Accidents in 53 Factory Departments and Each of 
40 Other Variables in a Large New Jersey Factory * 





Accident Accident 
Variable Frequency Severity 
. Number of production employees -16 
Total employees A —.12 
Per cent of employees who are male, production ; 63 
Per cent of ernployees who are male, salary : 50 
. Per cent of employees who are production workers _ AS 
. Production employees per supervisor : .08 
. Mean hours worked per week per production male : —.29 
. Mean hours worked per week per production female : —.20 
. Mean base pay of production males d 22 
. Mean base pay of production females / —.18 
. Sex hours differential, mean : — .28 
. Sex wage differential, mean P —.14 
. Intra-company transfer mobility d .28 
. Sex-ratio imbalance : 51 
. Gross turnover rate (including accessions) . — 30 
. Avoidable turnover rate (including accessions) j 02 
. Avoidable separation rate . — 
. Per cent of employees who are salaried male : .20 
. Per cent of employees who are salaried female 13 
. Per cent membership in company athletic association - 05 
. Accident frequency 4 
. Accident severity i xx 
. Efficiency (plant manager rating, three-month period) ; —.21 
Efficiency (mean rating of ten competent judges) -. 16 
. Mean job security (mean rating of twelve competent judges) 02 
. Mean supervisory quality (mean rating, twelve judges) . 08 
. Mean job prestige (mean rating, twe've judges) 3 
. Mean promotion probability (mean rating, twelve judges) j —.50 
. Mean job monotony (mean rating, twelve judges) Ad 03 
. Degree of completion of work (rating, suggestions supervisor) 
31. Fertility of suggestion field (rating, suggestions supervisor) 
32. Suggestion quota (established by suggestions supervisor) ‘ 12 
33. Total suggestions submitted : — 40 
34. Per cent of suggestion quota met . —54 
35. Per cent of suggestions adopted F 16 
36. Wage incentive system , —.35 
37. Mean noise level j 13 
38, Labor-management mean morale rating (mean of 39 and’40) — 40 
39. Morale as rated by personnel manager : — 36 
40. Morale as rated by union local officers (pres. and vice-pres.) 23 — 35 
41. Youthfulness of employees (per cent under 26) — 57 
42. Tenure (per cent employed more than twelve months) é 55 





Caonauwronr 








* Coefficients in bold face are statistically significant at the five per cent level or better. 


male in sex ratio for salary as well as production Most of these correlations undoubtedly are 
personnel; they are low in mean promotion artifactual rather than causal in significance. 
probability, low in fertility of suggestion field, However, a few are worthy of further study 
low in employee suggestions contributed, high and interpretation. Possibly substantial intra- 
(relatively) in average employee age level, and company transfer mobility makes employees 
higher in average employee tenure. more alert and interested in their work en- 





Accident Proneness of Factory Departments 


vironment, resulting in fewer accidents. The 
cross-fertilization of ideas which probably ac- 
companies intra-plant mobility may act also 
to reduce accident hazards and promote posi- 
tive cooperation with safety personnel. 

The tendency for departments lowest in 
promotion probability to be high in both 
accident severity and frequency may be of 
considerable psychological significance. It is 
plausible that when promotion is too unlikely, 
the typical employee may develop accident 
prone attitudes of relative indifference to the work 
environment. A reasonable chance to get ahead 
may constitute an incentive which not only 
stimulates the employee to do better work but 
may make him more alert to avoid hazards 
which may detain him in his progress. 

Accident prone departments usually have 
above average noise levels. Whether the 
noise level is causal of accidents or merely an 
incidental correlate of hazardous factory opera- 
tions is not entirely clear; it appears to be 
both causal and incidental. Certainly the re- 
duction of excessive noise levels whenever 
such reduction is practicable can be expected 
to do more good than harm as regards accident 
records. 

The pattern of correlates of accident severity 
is somewhat different from that of accident 
frequency. As might be expected, maleness 
is a marked characteristic of severe-accident 
departments; probably females are rarely 
placed on the most dangerous jobs or in the 
most “strenuous” departments. Also, male 
employees tend to be older; Chaney and Hanna 
(1) found that the probability of fatal or dis- 
abling results is greater among older than 
among younger accident cases. 

Less easily explained, however, is the fact 
that severe accident departments are units 
which tend to show a poor performance in 
contributing to the plant suggestion system. 
Superficially, it appears that departments 
which lag in making constructive suggestions 
through the employee suggestion boxes lag 
also in correcting dangerous conditions and in 
passing tips around on how not to get hurt; 
the superficial interpretation easily may be 
the valid one. 

Another tenable hypothesis is that the aver- 
age “foresight factor’’ of intelligence is lower 
in the severe accident departments because 


169 


foresightful employees tend to avoid or transfer 
away from dangerous work departments. A 
third hypothesis, also possibly tenable, is that 
severe accident departments are those which 
have been so highly systematized and per- 
fected from the industrial engineering stand- 
point that the average worker feels no incentive 
to try to improve the work or workplace 
through either employee suggestion boxes or 
alertness to unexpected hazards. While this 
latter hypothesis is improbable, it does seem 
highly significant that departments which are 
high in suggestion fertility are low in accident 
severity. 


A New Frame of Reference 
for Safety Promotion 


Perhaps, as some of the correlations in this 
study seem to suggest, a fundamental change 
in the total psychological frame of reference 
in which the average employee works is the 
basic key to reduction of industrial accidents. 
This probably can produce the probability that 
fewer total accidents will happen. 

A psychological work environment that re- 
wards the worker emotionally for being alert, 
for seeking to contribute constructive sugges- 
tions, for passing a tip to a co-worker on how 
best to do something or how not to get hurt 
appears from this research to be a profitable 
goal to work toward. Creating or promoting 
such an environment undoubtedly calls for a 
much broader perspective of approach to acci- 
dents than has hitherto been considered by 
most managements. 

An additional item of circumstantial proof 
that that which promotes alertness also tends 
usually to minimize accidents is the fact in this 
study that the departments with incentive-pay 
systems have no more accidents than other de- 
partments. Approximately half of the depart- 
ments studied are on incentive systems. These 
same departments are “‘problem”’ departments 
in many respects (higher turnover, more 
monotonous work, less job prestige, and less 
promotion probability). In fact, they have 
almost all undesirable characteristics except 
accidents in greater quantity than do the non- 
incentive departments. The “normal expec- 
tancy” record as regards accidents in incentive 
departments practically is in defiance of 














170 


physical and even some psychological work 
conditions. Even though: incentive systems 
rarely succeed as much as they theoretically 
should in motivating the worker, they never- 
theless appear to make him more alert to attain 
a reasonable productive goal and this alertness 
apparently makes him safer in his operations. 
These observations on incentive systems are, 
of course, somewhat speculative. However, 
the need for providing emotional rewards for 
alertness seems highly probable from this 
research. Such rewards might include eco- 
nomic rewards, prestige-building honors, extra 
privileges, and representation on special com- 
mittees and councils. These rewards held as 


attainable goals by workers in ‘dead end” jobs 
should operate to raise the average level of 
alertness, not just to hazards but to everything. 


Summary 


Accident severity and frequency were cor- 
related with each of forty other variables in 
the 53 departments of an electronics factory. 

1. Accident frequency is associated with low 
intracompany transfer mobility, small per cent 
of employees who are female and on salary, 
low promotion probability, and high noise level. 

2. Accident severity is associated with pre- 
dominant maleness, low promotion probability, 
low fertility of suggestion field, low suggestions 
record, non-youthfulness of employees, and 
high average tenure of workers. 


Willard A. Kerr 


3. A common explanatory factor among the 
accident frequency correlates appears to be 
depressants to alertness. The same factor ap- 
pears to be present in most of the severity 
correlates. 

4. Industry should direct increased attention 
toward enlivening of the psychological work 
environment, particularly with reference to 
provision of more and more emotional reward 
goals as incentives to raise the average level 
of alertness. 


Received August 15, 1949. 


References 


. Chaney, L. W., and Hanna, H.S. The safety move- 
ment in the iron and steel industry. Bur. Labor 
Statistics Report 234, 1918. 

. Chesire, L., Saffir, M., and Thurstone, L. L. Com- 
puting diagrams for the tetrachoric coefficient of 
correlation. Chicago: Univ. of Chicago Book- 
store, 1933. 

. Cobb, P. W. The limit of usefulness of accident 
rate as a measure of accident proneness. J. 
appl. Psychol., 1940, 25, 154-159. 

. Dunbar, Flanders. Medical aspects of accidents 
and mistakes in the industrial army and in the 
armed forces. War Medicine, 1943, 4, 161-175. 

. Guilford, J. P., and Lyons, T. C. On determining 
the reliability and significance of a tetrachoric 
coefficient of correlation. Psychometrika, 1942, 
7, 243-249. 

. Kelley, T. L. Statistical method. New York: Mac- 
millan, 1924. 

. Mintz, A., and Blum, M. J. A re-examination of 
the accident proneness concept. J. appl. Psy- 
chol., 1949, 33, 195-210. 





The Rank-Comparison Rating Method 


Reign H. Bittner 
The Prudential Insurance Company of America 


and 
Edward A. Rundquist 


Personnel Research Section, AGO 


Validating tests or other measures of apti- 
tude for an industrial job always begins with a 
search for an adequate criterion. A variety of 
measures have been tried as the yardstick of 
job success: turnover, absenteeism, medical 
visits, production records, merit ratings and 
specially devised ratings. Turnover, absen- 
teeism and medical records may be adequate 
criteria for validating personality tests or ap- 
plication blank material. These types of 
criteria will not be discussed here as our con- 
cern is with predicting ability to learn to do 
a job. 

Production measures would appear to be 
the ideal criterion for validating tests of apti- 
tude. Unfortunately, however, production is 
affected by so many things not under the con- 
trol of the individual that such records rarely 
reflect accurately what an individual can do. 
They are affected by variation in the quality 
of the material handled, by the pace of the 
machine, by the pace of others in the work 
group, by the correlation of job assignment 
with length of service, and even by the correla- 
tion of job assignment with ability to do the 
job. These and other factors operate almost 
universally to make production records inade- 
quate as a criterion of ability to perform a job. 
Even when such records are collected over a 
long period of time the factors mentioned do 
not necessarily average out. 

Merit ratings are often available as a possible 
yardstick of job success. However, upon ex- 
amination they usually turn out to be of little 
use. First, merit rating procedures usually 
have several purposes only one of which is to 
get a measure of the person’s ability to do the 
job. Other purposes such as aiding the super- 
visor to deal more effectively with his people, 
building morale, etc., affect the ratings in ways 
making them undesirable as criteria. Second, 


the distribution of merit ratings is often so 
skewed that their use as a discriminative 
criterion is ruled out. Third, most merit 
ratings are collected under such uncontrolled 
conditions and with so little training of the 
raters that they become valueless as criterion 
measures. It should be noted, however, that 
these defects are not necessary as both Fer- 
guson (5) and Mahler (7) have shown. 
Specially devised ratings are commonly used 
as criteria since merit ratings and production 
measures cannot be relied upon to furnish a 
good measure of individual performance. 
These special ratings have been of many kinds. 
Probably most common are the variations of 
the rank order or paired-comparison methods. 
The writers have developed the rank-compari- 
son rating method, a new method of obtaining 
a rating criterion which has proved both 
practical and valuable in many industrial re- 
searches. It is the purpose of this paper to 
describe the rank-comparison rating method 
and to present data concerning certain char- 
acteristics of ratings obtained by this method. 


The Rank-Comparison Rating Method 

The rank-comparison rating method com- 
bines features of the ranking and _ paired- 
comparison methods. It involves the following 
general steps: 

1. Separation of the total group into random 
sub-groups. 

2. Ranking within sub-groups. 

3. Successive merging of sub-groups by a 
modified paired-comparison method. 

The end result is a ranking of the total group 
from best to poorest achieved without the 
laborious comparisons involved when large 
groups are handled by the straight forward 
paired-comparison method or the confusion 
that arises in trying to rank a large group. 


si a EELS, 











Reign H. Bittner and Edward A. Rundquist 


Preliminary Preparation 

The preliminary preparation required before 
contacting the rater is as follows: 

1. Prepare a small name card for each person 
to be rated. 

2. Arrange the cards in alphabetical order. 

3. Divide the rater’s total cards at random 
into two or more sub-groups with from 15 to 20 
cards in each sub-group, keeping the groups as 
nearly equal as possible. Decide how many 
sub-groups are needed. Then take the alpha- 
betically arranged pack of cards and deal out 
the required number of groups as in dealing 
hands in a card game, dealing one card at 
a time. 

It is convenient, though not necessary, to 
divide the total group into 2, 4, or 8 sub- 
groups. This makes it possible to work with 
the same number of cards in each group at 
each stage of the merging process. 


Obtaining the Ratings 


The ratings are obtained during a conference 
with the rater. The procedure is as follows: 


Initial Judging 


1. Explain carefully and exactly to the rater 
what is to be considered in judging the persons 
to be rated. 

2. Lay out the cards for the first sub-group 
alphabetically in front of the rater in one or 
two columns depending on the size of the group. 

3. Ask the rater to choose the best person in 
the group, emphasizing again the basis on 
which the choice is to be made. 

4. Place the card of the person chosen as best 
at the top of a new column of cards. 

5. Ask the rater to choose the poorest person 
in the group. 

6. Place the card of the person chosen as 
poorest at the bottom of the new column of 
cards. 

7. Ask the rater to choose the best person re- 
maining in the group. Place this person’s 
card in the new column of cards under the one 
previously placed at the top of the column. 

8. Ask the rater to choose the poorest person 
remaining in the group. Place this person’s 
card in the new column of cards above the card 
already at the bottom of the column. 


9. Continue this process, alternately choosing 
best and poorest persons among the cards re- 
maining in the original group and building 
the new column by placing each card selected 
as best under the last card placed at the top of 
the column, and each card selected as poorest 
above the last card placed at the bottom of the 
column. It should be obvious at this point 
that the cards are being placed in rank order 
by building from both ends toward the middle. 

10. When all cards have been transferred to 
the new column by this alternating selection 
procedure, ask the rater to check over the way 
the persons have been ranked making any ad- 
justments considered necessary. Permit per- 
sons to be moved up or down in rank if the 
rater so desires. Ordinarily, few adjustments 
will be made. 

11. Pick up the cards, keeping them in rank 
order from best of all on top to poorest of all 
on the bottom. Put this pack of cards aside 


- for the present. 


12. Repeat steps 2-11 for each of the other 
sub-groups. 
Merging 


13. The next steps in the procedure involve 
the merging of the sub-groups so that the end 
result is the total group ranked in order from 
best-of-all to poorest-of-all. The merging pro- 
cedure will be described for any two sub- 
groups. If there are more than two sub- 
groups, the steps in merging are shown in 
Table 1 for any number of groups up to six. 
For example, if there are three sub-groups, 
merge sub-groups 1 and 2, and then use the 
same procedure to merge sub-group 3 with the 
total of sub-groups 1 and 2. 

14. Place the cards for sub-groups 1 and 2 
before the rater in two stacks, each stack 
arranged in rank order from dest person on top 
to poorest person on the bottom. The rater is 
directed to consider the fwo persons whose 
names show on lop of the two stacks of cards. 
Ask the rater, “Which of these two is the 
better?” The card chosen is removed and 
placed upside down on the table to begin a new 
stack of cards. 

15. Ask the rater, “Which of the two now 
showing is the better?” The card chosen is 
removed and placed upside down on the new 
stack. 





The Rank-Comparison Rating Method 


Table 1 
Showing Steps in Merging 





Steps 


| Groups to Merge at Each Step with Varying 
Numbers of Sub-Groups * 





in 
Merging 


| 
' 


4 Sub- 
Groups 


5 Sub- 
Groups 


Gl Gi 
with with 
G2 G2 





G3 G3 
G4 G4 











G(3+4) G5 
with with 
G5 G6 


G(3+4+4+5) | 
with 
G(1+2) 


G(1+2) 
with 
G(3+4) 











G(1+2+3+4 
with 
G(5+-6) 








* Sub-groups identified as G1, G2, G3, etc. 


groups 1 and 2 have been merged, etc. 


16. Continue the procedure, each time asking 
the rater to choose the better of the two names 
showing and placing the card chosen upside 
down on the new stack of cards. Usually, the 
question need not be asked more than twice; 
the rater quickly grasps the idea and proceeds 
through the comparisons without need of 
further prompting. When all cards have been 
transferred to the new stack, the two groups 
will have been merged into a single combined 
group which is ranked in order from best to 
poorest. 

17. Continue merging groups according to 
the merging steps shown in Table 1. In 
merging each pair of groups, repeat the pro- 
cedure given in 14, 15, and 16 above. 


; G(1+2) is the combined group after sub- 


Final Check 


18. At the conclusion of the merging process, 
the total group will have been placed in rank 
order from best to poorest. As a final check on 
the rank order, lay out the cards in rank order 
before the rater and ask him to make any ad- 
justments felt to be necessary. The rater is 
permitted to move persons up or down in rank 
if desired. Experience has shown that few 
adjustments are made, but it is desirable to 
give the rater this opportunity. 


Statistical Treatment of the Ratings 


The procedure results in rank-order ratings 
which are more amenable to statistical treat- 
ment if converted to another scale. In our 














Reign H. Bittner and Edward A. Rundquist 


studies, it has been found most useful to con- 
vert the rank-orders to standard scores. The 
method described by Garrett (6) has been 
used in making this conversion. 


Characteristics of Rank-Comparison Ratings 


Reliability. ‘The rank-comparison ratings 
are quite reliable in the sense that the same 
raters given the same directions will give 
essentially the same ratings even after a con- 
siderable lapse of time. For example, a fore- 
man rated 75 factory women and rerated them 
three months later. The correlation between 
the two ratings was .92 (11). Another fore- 
man rated and rerated 31 factory women one 
month apart. The correlation between the 
two ratings was .89 (2). 

The importance of measuring reliability of 
ratings under the same conditions is illustrated 
in the first study cited above. In an effort to 
get raters to control their bias favoring long- 
service employees, a straight paired-comparison 
method was tried along with special efforts to 
make the rater discriminate between ability 
and length of service of the people rated. The 
correlation between ratings by the rank-com- 
parison method and the paired-comparison 
method was .26. This is a marked change 
from the .92 when the same method with the 
same directions were used. Other foremen 
changed as noted in the study, but not to the 
same degree as this particular one. 

Agreement Among Raters. Agreement among 
raters is more likely to be a function of factors 
other than the method of rating. The degree 
of knowledge each rater has of the people 
rated and the varying standards used by the 
raters in judging performance are particularly 
important factors. It is of some interest, 
however, to note the degree of agreement 
achieved when the rank-comparison method is 
used. Forty-eight factory women were rated 
by their foreman, assistant foremen and by 
nine inspectors (9). Each inspector rated from 
4 to 40 of the women. The inter-correlations 
between ratings were: foreman and assistant 
foreman +-.65; foreman and “average in- 
spector” +.67; and assistant foreman and 
“average inspector” + .73. 

In another study, 97 supervisors and fore- 
men were rated by three raters on personality 


suitable for supervision (4). The intercorrela- 
tions between raters were: plant manager and 
personnel director +.63; plant manager and 
training director +.45; and personnel director 
and training director + .53. 

Agreement among the raters in these two 
studies is moderate to fairly high. Compar- 
able results have been found in other studies. 
This agreement can in no sense be interpreted 
as reliability of the ratings. However, if there 
were no agreement among raters who really 
knew their people and who were reasonably 
consistent in their rating standards, it would 
raise questions concerning the adequacy of 
the method. 

Relation to Paired-Comparison Ratings. A 
completely controlled experimental comparison 
of the rank-comparison method with the paired- 
comparison method is not available. Two 
studies are available from which inferences as 
to the relationship between the two methods 
can be drawn. The results of the two studies 
indicate that if all conditions are controlled 
the two methods give essentially the same 
results, 

A study previously cited (11) involved rating 
of four groups of factory women by the rank- 
comparison method and rerating seven months 
later with a partial paired-comparison technique. 
In the rerating technique, the sub-groups set 
up in accordance with the rank-comparison 
method were ranked by a straight paired- 
comparison technique after special directions 
were given emphasizing that the rater should 
discount as much as appropriate the length 
of time persons had been on the job and rate 
solely on actual ability to perform the job. 
(These special directions were not given in the 
original rank-comparison rating.) The sub- 
groups were then merged according to the rank- 
comparison technique. The rating and re- 
rating correlations for four raters are: Rater 
1(N=75) +.26; Rater 2 (N=82) +.90; Rater 
3 (N=64) +.84; and Rater 4(N=67) +.70. 
The correlations between the two ratings are 
high in all but one case. The low correlation 
for Rater 1 appeared on investigation to be due 
to the change in directions on evaluating length 
of time on the job rather than to the paired- 
comparison technique used in ranking the sub- 
groups. It would seem then that to the extent 





The Rank-Comparison Rating Method 


the paired-comparison technique was involved 
it did not markedly change the ratings. 

A second study (1) involved rating of 18 
factory women. The rank-comparison method 
was modified slightly as follows: sub-groups 
were ranked by their foremen and the depart- 
ment supervisor then merged the sub-groups 
in the standard way. The department super- 
visor several days later then rated the 18 
women by a straight paired-comparison tech- 
nique. The correlation between the rank- 
comparison and the paired-comparison ratings 
was .97. Obviously, there is no essential 
difference in the ratings obtained by the two 
methods. 

Relation to Rating Scale Ratings. A study 
in the selection of supervisors (4) involved the 
use of several types of criteria of personality 
suitable for supervision: rank-comparison, a 
9-trait rating scale with each trait rated on a 
5-point scale, and a 2-point (above and below 
average) rating scale on overall personality for 
supervision. Two raters rated by rank-com- 
parison and the 2-point overall rating methods 
all of the 96 supervisors they knew well enough 
to rate. In addition, they rated the super- 
visors immediately under them on the 9-trait 
scale. The correlations between the rank- 
comparison and the other two types of ratings 
for each rater are: Rater 1, rank-comparison vs. 
two-point overall scale (N=92), bi-serial 
r=.93; Rater 1, rank-comparison vs. nine trait 
scale (N=19)=+.88; Rater 2, rank-com- 
parison vs. two-point overall scale (N = 96), bi- 
serial r= +.75; and Rater 2, rank-comparison 
vs. nine trait scale (N= 13)=+.77. 

The rank-comparison method gives results 
closely comparable to the other two methods 
for Rater 1 and quite comparable for Rater 2. 
Depending on the type of rating desired from 
other considerations, it is indicated that the 
rank-comparison method may be used with 
confidence instead of rating scale methods. 

Relation to Production Criteria. Rank-com- 
parison ability ratings and production criteria 
should be closely related if both are valid and 
reliable measures. The usual inadequacies of 
production criteria mentioned earlier were all 
present in such criteria available in our studies. 
Thus, a close relation between the two types of 
criteria was not expected. However, some 
evidence has been found to indicate that as the 


175 


worker's individual control over production in- 
creases, the relationship increases between 
production measures and the ratings. In one 
study where the individual’s production was 
largely machine-paced and where there was no 
opportunity for differences in type of material 
handled to cancel out, the correlation between 
the production criterion and the foreman’s 
rating was .24 (10). In another situation the 
production of the individual was considerably 
more under the worker’s control (9). In this 
situation where each worker could set her own 
pace, correlations between ratings and average 
pay period efficiency was .84 for the inspector’s 
rating, .73 for the assistant foreman’s rating, 
and .78 for the foreman’s rating. In a third 
study where again production was fairly well 
controlled by the worker, the correlations be- 
tween supervisors ratings and three production 
measures were as follows: .66 with production 
speed; .50 with production quality, and .70 
with overall production efficiency (1). These 


findings sugyest that the rater makes allowances 


for difficulty of job assignment, quality of 
products handled and the like in making his 
ratings. It also suggests that with proper care, 
ratings can be obtained which wili be of almost 
as great value as adequately controlled produc- 
tion measures for the purpose of validating 
tests. This is an important consideration 
where adequate production records are not 
routinely kept since ratings can be obtained 
much more cheaply. 

Predictability of Rank-Comparison Ratings. 
It might be expected that ratings would be 
more predictable than production records if 
raters make allowances for the factors that 
often operate to make production records in- 
adequate criteria of the individual’s worth on 
the job. There is some evidence that this is 
the case. In the first study cited above (10) 
involving 63 factory women, a test battery 
was developed that correlated .26 with the 
production measure. However, it was possible 
to develop a test battery that correlated .47 
with the rating criterion. In the second study 
cited above (9), 37 women were tested. These 
were divided into two groups, one of 20 women 
with less then ten months service, and one of 
17 women with ten months or more of service. 
The test battery was developed on the basis of 
an average production and rating criterion. 











Reign H. Bittner and Edward A. Rundquist 


Table 2 


Correlations between Test Battery and Various Criteria 





Criterion 


Test-Criterion Correlations 


‘Short Service 
(N = 37) 





Total 
Group 


Service 
w= 17) 





Ave. Production and Rating 

Ave. Foreman and Ass’t Foreman Rating 
Production Efficiency* 

Ave. Inspector’s Rating 





( ee 
61 +H 49 


70 A5 49 
50 38 Al 
Al A2 29 





* The average of 10 to 26 pay-period efficiency indices based on standards set by Industrial Engineering De- 


partment. 


The correlations of the test battery with the 
various criteria for the short and long service 
groups are shown in Table 2. 

The most predictable single criterion is the 
average rating of the assistant foreman and the 
foreman. Efficiencies, however, tend to be 
more predictable than the inspectors’ ratings. 
Perhaps the inspectors whose job is to detect 
violations of quality standards are overly in- 
fluenced by the knowledge of defects found and 
do not make sufficient allowances for the 
difficulty of the job or for varying lengths of 
service. They seem confused by the length 
of service factor. Instead of a correlation for 
the total group somewhere in between those for 
the short and long service groups as was found 
for the other criterion measures, the correlation 
for the inspectors is lower for the total group. 


Advantages of the Rank-Comparison Method 


Traditionally, ranking methods have been 
considered as a substitute for the paired-com- 
parison method. From the evidence presented 
above, the rank-comparison combination of the 
two methods appears to yield as satisfactory 
a result as the paired-comparison method. 
Moreover, it has four great operational ad- 
vantages: (1) it is easily understood by the 
raters; (2) raters like the method and have con- 
fidence in it; (3) it can be applied to large 
groups; and (4) it requires very little of the 
rater’s time. 1 

The method is easily understood by the 
rater. At least 50 department heads, foremen 
and inspectors have rated their people with this 
method. Many of these raters had little ver- 


bal facility but no difficulty in comprehension 
has been encountered. In fact, the method of 
merging the ranked sub-groups was invented 
when one rater was unable to understand a 
system whereby more than two names were 
exposed simultaneously. Once the present 
method had been devised, however, the rater 
proceeded with no further difficulty. 

Raters like and have confidence in the 
method. Their reaction has always been 
favorable. Not only is the method liked by 
the raters but it invokes in them a feeling of 
confidence that the ratings are accurate mea- 
sures of ability. Many have commented 
that this method should be used in giving merit 
ratings, although the writers do not agree 
with this. 

The method can be used with large groups. 
When groups of 100 or more are to be rated, the 
paired-comparison method obviously becomes 
too unwieldy. With 100 cases it would in- 
volve 4,950 comparisons. It is also difficult to 
attempt to rank this many people. To over- 
come this difficulty, large groups are often 
divided into random sub-groups of equal size 
and then either the rank order or _paired- 
comparison method is used within sub-groups. 
This involves the assumption that sub-groups 
are truly random groups. The rank-com- 
parison method gets away from such problems 
and such assumptions. It does not matter 
how large the group is; it can be ranked from 
1 to N by this method with very little compli- 
cation. The feature that makes this possible 
is that the rater never has to consider more 
than 20 persons at one time. In practice, it 
was discovered that when more than 20 cards 





The Rank-Comparison Rating Method 


were placed before the raters, they sometimes 
became confused and were hesitant in making 
the original rankings. With 20 or fewer cases, 
no difficulty has been encountered. 

The method requires very little of the rater’s 
time. While no systematic records have been 
kept, instances are known where close to a 
100 people have been ranked in less than 15 
minutes. 


Conclusion 


Ratings, with all their difficulties, are the 
most common criterion data in industrial 
personnel research. Not only are they the 
most common but they are probably the best 
under most circumstances. Production re- 
cords, if obtained under conditions of equal 
training and control of all factors affecting 
individual performance except individual dif- 
ferences in ability, would be the ideal criterion. 
With the rare exception of some training pro- 
grams (8), such conditions are seldom met. 

The rank-comparison method of rating 
presented here is a practical and useful method 
which has certain advantages over other 
methods. It does not, however, solve all the 
problems incident to the use of ratings as 
validation criteria. Since in the majority of 
industrial situations the experimenter is re- 
duced to the use of ratings, it would seem useful 


177 


to devote more study to methods of obtaining 
these ratings. The solution of these problems 
awaits further research. 


Received August 12, 1949. 


References 


1. Bittner, R. H. The selection of bottle decorating 
machine operators. (Unpublished study.) 

2. Bittner, R. H. The selection of bottle inspector- 
packers. (Unpublished study.) 

3. Bittner, R. H. The selection of handyman- 
inspectors. (Unpublished study.) 

. Bittner, R. H., and Rundquist, E. A. 
ment of a supervisor personality test 
lished study.) 

. Ferguson, L.W. The effect upon appraisal scores 
of individual differences in the ability of superiors 
to appraise subordinates. Personnel Psychol, 
1949, 2, 377-382 

. Garrett, H. E. Statistics in psychology and educa- 
tion. New York: Longmans, Green and Co., 
1937. Pp. 168. 

. Mahler, W. R. An experimental study of two 
methods of rating employees. Personnel, 1948, 
25, 211-220. 

. McGehee, W. Cutting training waste. 
Psychol., 1948, 1, 331-340. 

. Rundquist, E. A. Predicting success of glass 
selectors. (Unpublished study.) 

Rundquist, E. A. The selection of tumbler inspec- 
tor-packers. (Unpublished study.) 

11. Rundquist, E. A., and Bittner, R. H. Using 
ratings to validate personnel instruments: a 
study in method. Personnel Psychol., 1948, 1, 
163-183. 


Develop- 
(Unpub- 


Personnel 











Validity of an Objectivity Key on a Short Industrial 
Personality Questionnaire * 


Edward R. Carr and Harold F. Rothe 


Stevenson, Jordan and Harrison, Inc., Chicago, Illinois 


In a previous paper, Rothe described the use 
of an Objectivity key on a short industrial 
personality questionnaire (3). The Objec- 
tivity key described there and referred to in 
this paper consists of six items patterned after 
the L scale of the MMPI (1). The Objectivity 
key was shown to permit the adjusting of 
scores on some other keys so that highly ob- 
jective persons would be compared with norms 
based upon other highly objective persons; 
persons of medium or low objectivity would 
also be compared with their appropriate groups. 
This technique is intended to minimize the 
effects of “faking” on questionnaires. 

This technique is only valuable, however, 
when the Objectivity key is valid, and when 
the other keys are valid. A valid Objectivity 
key is one that indicates which persons are 
being objective and which ones are not being 
objective while answering the questionnaire. 
A non-objective person is one who is “putting 
his best foot forward” and attempting to “look 
good”’ on the questionnaire. A highly objec- 
tive person is one who is extremely frank while 
answering the questionnaire, not attempting 
to hide what are apparent faults or weaknesses. 

The purpose of the present paper is to 
present some data that indicate that the 
Objectivity key used in this study does 
separate highly objective from non-objective 
respondents. 


Experimental Technique 


The industrial personality questionnaire (3) 
was administered three times to a group of 


fifty college students. Different instructions 
were given orally to the group each time. The 
first instructions were to “fake the question- 
naire” so as to “look good” for a job for which 
they were to assume they were applying. 
The second instructions, given after all stu- 


* The authors wish to acknowledge the assistance of 
Miss Judy Yackle in preparing this paper 


dents had finished the form (in about five 
minutes), were to “fake the form to look bad.” 
The third instructions, given after all students 
had finished the form the second time, were 
to “‘be as honest as possible.” 

As far as could be observed, good rapport 
existed between administrator and students, 
and it was believed that the students were 
cooperating. The students were freshmen, 
chiefly in an engineering curriculum.' The 
technique of several administrations with dif- 
ferent instructions is generally similar to the 
technique used by Giese and Christy, reported 
in Tiffin (4) and by Longstaff (2). The results 
(means and s.d.’s) for the four personality keys 
for the three kinds of test administration are 
shown in Table 1. The actual distributions of 
these scores are filed in ADI.” 


Results on The Objectivity Key 


As Table 1 shows, the responses of the stu- 
dents on the Objectivity scale varied from 
trial to trial and in the anticipated direction. 
That is, when told to ‘fake to look good”’ the 
students obtained very low Objectivity scores. 
When they faked to make themselves “look 
bad” they obtained very high Objectivity 
scores. When they were “honest” they ob- 
tained the pattern of Objectivity scores, re- 
sembling a normal curve, that is customarily 
obtained in samples of industria] personnel. 

The critical ratio between “good” and “‘bad”’ 
was 17.27; between “good” and “honest” was 
6.05; and between “bad” and “honest” was 

' The writers wish to thank Dr. Ernest McCormick, 
Department of Psychology, Purdue University, for 
pe this experiment to be conducted in one of 
1s classes 

* Tables 2, 3, 4, and 5 have been deposited with the 
American Documentation Institute and may be ob- 
tained by ordering Document No. 2723 from American 
Documentation Institute, 1719 N Street, N.W., Wash- 
ington 6, D. C., remitting $.50 for microfilm (images 1 
inch high on standard 35 mm. motion picture film) or 


$.50 for photocopies (6 X 8 inches) readable without 
optical aid. 


178 





Validity of Objectivity Key on a Personality Questionnaire 


Table 1 
Means and S.D.’s of Four Personality Keys for Three 
Kinds of Test Administration 
Note N = 50 engineering college freshmen. 


Objectivity: 
Look good 
Look bad 
Be honest 
Emotional Score: 
Look good 3.6 
Look bad 10.6 
Be honest 5.2 
Social Dominance: 
Look good 79 
Look bad 1.2 
Be honest 61 
Drive Scores: 
Look good 6.1 
Look bad 47 
Be honest 6.3 


8.68. All of these differences are significant. 
It is apparent, then, that college students can 
vary their Objectivity scores, depending upon 
the instructions given them, and presumably 
depending upon their set. Since the changes 
in score are in the direction that would be ex- 
pected, according to the theory underlying the 
Objectivity key, or the MMPI L-Scale, it may 
be concluded that the Objectivity key does 
provide a measure of the extent to which the 
respondents are “faking.” A scale with a 
greater range would, of course, provide a more 
adequate measure. 


Results on the Emotional Key 


In the previous paper it was shown that re- 
spondents with low Objectivity scores tended to 
have low scores on the so-called Emotional key. 
Respondents with high Objectivity scores had 
high Emotional scores. Accordingly the Ob- 
jectivity score was found to be useful in inter- 
preting the Emotional score (i.e., by making 
adjustments upwards or downwards in the 
Emotional score). 

The students’ Emotional scores showed the 
same relationships (see Table 1). The stu- 
dents, when “looking good,” were being non- 
objective in order to “look good,” and showed 
low Emotional scores. The mean score was 


179 


3.6. When the students were “looking bad,” 
the mean Emotional score was 10.6. When 
they were “honest,” the mean was 5.2. The 
differences between these three conditions are 
again all significant, the critical ratios being 
19.55 between “good” and “bad,” 3.51 be- 
tween “good” and “honest,” and 13.75 be- 
tween “bad” and “honest.” It is, therefore, 
apparent that college students can vary their 
Emotionality scores, depending upon their in- 
structions and presumably upon their sets. 
It may also be concluded that low Emotional 
scores may be associated with low Objectivity 
scores, and high Emotional scores may be 
associated with high Objectivity scores. 

There is a possibility that there is an in- 
evitable relationship between Objectivity and 
Emotional scores and that both are measuring 
the same thing. Data to be published shortly 
rule out this possibility. Leaving that ques- 
tion for the moment, it has been shown here 
that the Objectivity key measures what it 
purports to measure and that the technique of 
interpreting Emotional scores within a frame 
of reference established by the Objectivity 
scores is a valid one. 


Results on the Social Dominance Key 


Social Dominance scores were not found 
to vary with Objectivity scores for the in- 
dustrial sample previously reported. Inter- 
estingly enough, for the college sample dis- 
cussed in this paper, Social Dominance scores 
were found to vary with the administrative 
instructions, and also with the Objectivity 
scores. 

The mean Social Dominance score, under the 
“look good” conditions, was 7.9; the mean 
when “looking bad’’ was 1.2; the mean when 
“being honest” was 6.1 (see Table 1). The 
critical ratios are: 27.86 between “good’’ and 
“bad”; 5.15 between ‘“good’’ and “honest’’; 
and 13.15 between “bad” and “honest”; and 
all of the differences are significant. 

These results are particularly interesting in 
view of the fact that the students, when 
“honest” give essentially the same distribution 
of Social Dominance scores as does the in- 
dustrial sample previously reported. That is, 
college students apparently conceive of “Social 
Dominance”’ as being a desirable set of habits 











180 


for jobs to which they might aspire (‘look 
good’’), but industrial personnel of various 
categories do not “fake” questionnaires to 
make themselves appear more highly socially 
dominant than they are. That is a highly 


interesting point deserving more research.’ 


Results on the Drive Key 


The industrial sample previously reported 
did not show a variation in Drive scores with 
Objectivity scores. The college students re- 
ported here do show some relationship be- 
tween these two keys. 

The mean score on Drive when “faking 
good” was 6.1; the mean when “faking bad” 
was 4.7; and the mean when “‘honest”’ was 6.3 
(see Table 1). The critical ratios are: 5.68 
between “good” and “bad”; 0.42 between 
“good” and “honest”’; and 4.73 between ‘“‘bad”’ 
and “honest.” The difference between ‘‘good”’ 
and “honest” is not significant but the other 
two differences are significant. Thus, the 
college students could “fake” the Drive key 
to show that they had little drive, but they 
could not, or did not, “fake’’ to show more 
drive than they actually possess. 

There are two additional features about the 
college students’ not “faking” a higher drive 
than they did in order to look good. One is 
that their Drive scores, while “honest,’”’ were 
higher than while faking to took good, although 
insignificantly so. This leads to the possibility 
that this key is too subtle to fake. Unfor 
tunately, the writers have no data to answer 
that problem. 

The second feature is the possibility that 
the students actually had such a high drive 
they did not believe they would have to fake it. 
However, the students did not have an unu- 
sually high distribution of Drive scores, being 
substantially the same as those of the industrial 
samples previously described. 

Accordingly it is tentatively concluded that 
the Drive key is too subtle to be faked, with the 
exception of one or maybe two items that can 
be faked to show lack of drive. 


* Neither of the groups of persons knew the names of 
the keys, but the Social Dominance items are fairly 
obvious; i.e., “I dislike walking across the middle of a 
room full of people.” 


Edward R. Carr and Harold F. Rothe 


Summary 


In a previous paper the use of an Objectivity 
key to establish a frame of reference for inter- 
preting certain other keys on a short industrial 
personality questionnaire was described. The 
present paper describes an experiment to estab- 
lish the validity of that Objectivity key. 
This experiment was conducted with college 
freshmen rather than with an industrial popu- 
lation. The experiment should be repeated 
with an industrial sample. 

When instructed to “fake” the questionnaire 
to “look good” for a job for which they were 
to assume they were applying, the students 
obtained low Objectivity scores and low Emo- 
tional scores. The very low Objectivity 
scores would indicate to an interviewer that 
these persons might be “‘faking,”’ and their 
Emotional scores should be compared with 
other persons of low Objectivity scores. 

In a like manner, when the students faked to 
“look bad” for the jobs, they obtained high Ob- 
jective and high Emotional scores, and these 
could again be interrelated in interpretation.‘ 

The particular questionnaire described here 
is used by consulting psychologists as an inter- 
view aid. It is concluded from this study that 
the Objectivity key is a valid key for locating 
“fakers” and for locating extremely frank re- 
spondents, and hence contributes to the inter- 
view. It permits the use of different norms for 
the Emotional key, based on the Objectivity 
score. 

Other findings are that college students fake 
a questionnaire to make themselves appear 
socially dominant, as a desirable job character- 
istic, and socially submissive, as an undesirable 
job characteristic. This relationship was not 
found on an industrial sample previously re- 
ported. College students fake a lack of drive 
as an undesirable characteristic, but they 
cannot, or do not, fake a high drive as desirable. 


Received August 18, 1949. 


* Readers may wonder why faking “good” and faking 
“poor” do not yield a consistent pattern on the Objec- 
tivity key. The nature of the Objectivity key, and the 
MMPI L-Scale, is such that opposite directions of 
faking result in shifts to opposite ends on the Objec- 
tivity or L-Scales. 





Validily of Objectivity Key on a Personality Questionnaire 181 


References Blank and The Kuder Preference Record. J. 
appl. Psychol, 1948, 32, 360-369. 
1. Hathaway, S. R., and McKinley, J. C. Manual 3. Rothe, H.F. Use of an Objectivity Key on a Short 
for the Minnesota Multiphasic Personality Inven Industrial Personality Questionnaire. J. appl. 
tory. New York: The Psychological Corpora- Psychol ., 1950, 34, 98-101 
tion, 1943. 4. Tiffin, J. /ndustrial psychology. New York: Pren- 
2. Longstaff, H. P. Fakability of The Strong Interest tice-Hall, Inc., 1947, rev. ed., pp. 170-171. 


; 
' 











Getting Your Message Across by Plain Talk 


Arthur O. England 
Personnd Planning Office, Air Materiel Command, Dayton, Ohio 


In 1948, the Personnel Planning Office con- 
ducted an employee attitude survey in the Air 
Materiel Command (AMC). A sampling of 
our some 80,000 civilian employees revealed 
some highly interesting facts about the effec- 
tiveness of our communications. Slightly 
more than one-third of those sampled stated 
they did not know the procedure for submitting 
a grievance! The significance of those figures 
became apparent to our top management when 
they discovered that there were at least three 
different publications dealing with the subject 
of grievances. There was an Air Force Regu- 
lation, an AMC Regulation and a Civilian 
Personnel Letter distributed to each employee. 
Since approximately 70% of the civilians em- 
ployed by the Air Force work in the Air 
Materiel Command, this lack of knowledge of 
personnel procedure seemed worthy of our 
attention. We began asking ourselves just 
how many other procedures and policies of 
management were unheard of or misunderstood 
by the employees. Inasmuch as the employees 
were “informed” by our various publications, 
the difficulty might very likely lie in the 
language and style used in writing these 
directives. 


Readability of Our Publications 


Using the Flesch formula, an analysis was 
made of the readability of literally hundreds of 
Air Force Directives, Civil Service Regulations, 
Technical Orders, Maintenance Handbooks, 
employee newspapers, and so forth. Briefly. 
our study showed that the Air Materiel Com- 
mand was not getting its message across to 
the employees: 


1. More than 90% of our people found it 
hard to read and understand our directives. 

2. Technical Orders and Maintenance Hand- 
books used too many big words. Uncommon 
non-technical words hinder the reader in 
grasping technical ideas. 


3. More than 90% of our people found it 
hard to read and understand articles in our 
civilian newspapers. 

4. Office memoranda were written in the 
third person. They were filled with trite 
phrases. It took too long to read them. 
Also, messages in that style are hard to 
remember. 

5. More than 60% of our airmen found it 
hard to read and understand directives ad- 
dressed to them. 

6. More than 20% of our Air Force officers 
found it hard to read and understand messages 
addressed to them. 


Table 1 shows the “‘average”’ reading score of 
the different publications sampled. Nine dif- 
ferent employee newspapers from our field 
installations throughout the country were 
sampled. 

Obviously, to be able to relate the reading 
ease scores of printed material to any reading 
audience, it is necessary to know the educa- 
tional background of that audience. A study 
of the personnel records of our civilian em- 
ployees showed that their educational training 
very closely approximated that shown for the 
U.S. adult population. Thus, the U.S. census 
figures used by Flesch in The Art of Readable 


Table 1 


Reading Ease Survey of Different Types 
of Publications 


Average 
Reading 
Ease Score 


Type of 
Publication 

AMC Technical Orders 

AMC Regulations 

AMC Letters 

AMC “Daily Bulletins” 

Hq Office Instructions 

Air Force Regulations 

Air Force Letters 

Civil Service Regulations 

AMC employee newspapers 


Description 
of Style 

Difficult 
Very difficult 
Very difficult 
Very difficult 
Very difficult 
Very difficult 
Very difficult 
Very difficult 
Difficult 





182 





Getting Your Message / 


Table 2 


Cumulative Percentage of Airmen and of Officers 
Having Various Amounts of Education 


Educational Level 


Some grammar school 
Grammar school graduate 
Some high school 

High schoo! graduate 
Some college 

College graduate 

Post graduate 


Airmen 
99.87 
91.97 
78.27 





Writing were adapted for use in studying our 
civilian audience. 

Estimates of the educational levels of our 
military readers are shown in Table 2. 


Translating Research Findings into Action 


Convinced that the gobble-de-gook in gov- 
ernment writing was a real barrier in manage- 
ment’s effort to get its message across to the 
employees, positive steps were taken to sell 
operating officials on the merits of plain talk. 
It was recognized that, without the complete 
support of top management, our re-educational 
program stood little chance of succeeding. 
Through the support of Major General J. M. 
Bevans, Chief of the Personnel and Adminis- 
tration Department, the writer presented a 
talk before top staff officers of AMC. By 
using visual aids in the nature of a large, illus- 
trated flip chart, the failure of our gobble-de- 
gook was shown. Top management enthusi- 
astically indorsed plain talk. 

For the next three months, lectures were 
given on plain talk. Why it should be used, 
how to use the Flesch formula, and what plain 
talk would do was discussed. All major com- 
ponents and divisions of the headquarters 
were covered in this “educational” campaign. 
At the same time, a work book was prepared 
showing how to apply the reading ease formula 
with samples of re-writes of our various pub- 
lications. After each lecture, these work books 
were distributed to the audience. Over 2,000 
top officials were covered in these lectures. 
The basic ground work was laid for getting 
plain talk accepted by management. 

The next step meant overcoming “‘the re- 


dcross by Plain Talk 183 
sistance to change’’ of thousands of our lower 
level supervisors. It was decided to publish 
an official manual to assist all those individuals 
who do any writing for the Air Materiel Com- 
mand. Attempting to practice what we were 
preaching, we prepared the AMC Manual 11-1, 
entitled, Gobble-de-gook or Plain Talk? This 
manual has a reading ease score of 72; style, 
fairly easy; audience level, 6th grade. Next, 
in an effort to attract and hold readership, the 
manual was illustrated with cartoons. Bold 
headings typical of commercial advertising 
were also used on each new topic under dis- 
cussion. Further, we felt that if this technique 
of writing our message in an easy-to-read style 
were to be accepted, some convincing selling 
had to be done. For many years the federal 
government and the Armed Forces have been 
using the same trite, unimaginative, difficult- 
to-read style of writing. Overcoming that 
habit pattern would not be easy. The selling 
points used in the manual were as follows: 

1. Gobble-de-gook is costly. We had a selling 
point that is not always applicable to private 
industry. Every work published is read 
during the work day. This even applies to 
our house organs (the civilian newspapers). 
The point was made that readable language 
saves reader time. And time saved means 
money saved for management. It costs 
$24,160.00 if our employees spend only ten 
minutes of their working time reading a four- 
page directive. This is a modest estimate. It 
takes ten minutes to read and understand even 
one page of some of our gobble-de-gook writing. 
Obviously, if the message in the publication is 
not understood, then our national defense 
money is not being spent wisely. 

2. Plain talk saves reader time. Why not 
honor your reader’s time? Think of the reams 
of paper work that flow over the reader’s desk 
daily. The writers of these papers are in com- 
petition with each other for the reader’s atten- 
tion. Writing stripped of all gobble-de-gook 
stands a good chance of being read first. It’s 
brief and to the point. 

3. Plain talk style pleases your readers. Long, 
wordy sentences confuse the reader. Imagine 
trying to understand the ideas in one sentence 
of 461 words. Such a sentence was found in 
one of our military publications. The reason 
“Time” and “Reader’s Digest” are so popular 











184 


is because they please the reader. The average 
sentence in these magazines has only 17 words. 
They sell because people want to read them. 
You have ideas to sell, too. Do you write ina 
style that pleases your reading public? 

4. Plain talk principles help get ideas across. 
It’s easy to get into the habit of writing for a 
mythical audience. But you are writing for 
real persons to read and understand. They 
may be busy commanders and division chiefs. 
They may be supervisors who are more inter- 
ested in getting out production than wading 
through a stack of publications. They may 
be ungraded workers who quit school at the 
seventh grade. In every case, you must decide 
who makes up your audience. Who will read 
your message? If you know the educational 
background and reading ability of your 
audience, you will be able to write so they can 
understand you. 


Examples of Using Plain Talk 

Here’s what a current Air Force directive 
says about the grievance procedure: 

“(1) An employee who has a grievance or his 
representative will normally present the griev- 
ance, in the first instance, orally to the imme- 
diate supervisor. The supervisor will consider 
it promptly and impartially, collecting the 
necessary facts and reaching a decision. If 
the employee is not satisfied with the solution 
of the problem, he will be advised that he 
may discuss the problem with the next higher 
supervisor. 

(2) If the employee feels that an interview 
with the immediate supervisor would be un- 
satisfactory, he or his representative may, in 
the first instance, present his grievance to the 
next supervisor in line. Where an employee 
feels an interview with the second supervisor 
would likewise be unsatisfactory he may seek 
counsel from the civilian personnel officer or 
his employee relations counselor, whose role 
will be to advise and aid him in facilitating the 
employee’s approach to a supervisory level 
determined appropriate by the facts in the 
particular case.” 


(Reading ease score, 24; style, very difficult; 
audience level, college graduate) 


Here's how that section of the directive 
could have been written for the employees: 


Arthur O. 


England 


‘Is something about your job bothering you? 

“Here are the steps you can take to solve 
your problem. In most cases it will be solved 
at the first step. If not, you have the right 
to keep going on up to the top. You may 
present your own case or have someone do it 
for you. 

“Talk with your supervisor. He has been 
told to give a prompt and fair answer to all 
problems. Usually, a short, friendly talk 
with him will fix things up. Be honest and 
sincere when you talk with him. 

“If you feel that your supervisor will not 
handle your case fairly, you may go directly 
to Ais supervisor. Or, if you have gone to 
your supervisor and he didn’t handle your 
problem to suit you, you may still go to his 
supervisor. 

“If you feel your case has not yet been, or 
will not be, handled fairly by either of them, go 
to your personnel technician. He can’t give 
you a final answer, but he can tell you how to 
get it.” 


(Reading ease score, 85; style, easy; audi- 
ence level, 5th grade) 


Tips on Writing to Be Understood 


In addition to presenting an explanation and 
examples of using the Flesch formula with 
potential reader audience, the following tips on 
writing to be understood were presented in our 
manual: 

The key question to all writing is, “Wha/ am 
I trying to tell whom?” 

If you consider the following points, you 
may be fairly certain your message will be 
understood. 

Define your audience. When you sit down 
to write, the first thought that should come to 
your mind is, “Who will read it?” As a rule, 
materia! that is written for the base commander 
with college training is not suitable for the 
ungraded employee who quit school at the 
eighth grade. Writing can be simple enough 
to be read with ease and understanding by a 
poor reader and yet be interesting enough to 
hold the attention of a good reader. 

Define your purpose. Just what are you 
trying to say? Is your purpose to get em- 
ployees to save their sick leave? Or is it to 
explain the benefits of sick leave? If you are 





Gelling Your Message Across by Plain Talk 


not clear in your mind about what you want 
to say, it’s a sure bet your reader won’t know 
either. The purpose for writing is of foremost 
importance. The reader should be able to 
understand what to do, why it must be done, 
and how to do it. 

Present your ideas in logical order. A simple 
easy flow of related ideas is necessary if your 
message is to get the effect you want. Each 
part of your message should prepare the 
reader for what is to come. Don’t jump from 
one idea to another. Complete your discus- 
sion of each idea before introducing another. 
Present one idea at a time. 

Avoid unnecessary technical ‘‘nicelies.” 
Don’t use fine distinctions in words when they 
are not needed. Writers often spend too much 
time quibbling about technical niceties which 
have no real meaning for the reader. 

Keep the vocabulary familiar. Logic will 
not help the reader if he does not understand 
the words used. In AMC our biggest job is to 
find ways of writing about technical ideas. 
These ideas are often complicated and some 
technical words must be used. But the words 
used to modify these technical words should be 
familiar ones. Non-technical terms that will 
cause trouble for the reader should be omitted 
wherever possible. A technical idea is hard 
enough to grasp without also including hard 
non-technical words to confuse the reader. 

Use simple sentences. It was the fashion 
many years ago to write articles with sentences 
running well over 100 


words. Today edu- 


shies deasnene t To 


185 


cators clearly show us that writing can be more 
easily read and remembered if the sentences 
are short. In order to do this, you should 
avoid using involved sentence structure. In- 
volved sentences are too much of a mental 
burden for most readers. Why ask your reader 
to expend mental effort trying to figure out 
what you are trying to say? Be brief and 
honor the reader by telling facts in short, 
brisk sentences. 

Use words of one or two syllables. Avoid 
words that will stop the reader. Use the good 
short ones that come first to your mind. 
Edit your paper from your reader’s point of 
view before signing it. These few words found 
in current AMC directives are typical of the 
“stoppers”’ we mean: 

recapitulation 


idiosynecracies 
beleagured 


adjudication 

“Stoppers” add to the reader's difficulty in 
getting the message. Foreign phrases should 
be avoided. Also, don’t use short words that 
are not common. Using words of one or two 
syllables is not the whole answer. The words 
must also be understood. 

There have been many noticeable improve- 
ments in our communications since the incep- 
tion of the campaign for more plain talk in 
government writing. But like all ingrained 
habits, it will take time to unlearn the old ones 
and adopt the new ones. 


Received March 13, 1950 
Early publication 


z 
3 
€ 
% 
4 
= 
: 
% 
5 
2 
; 
| 


paeigatt: 


Wreeonees 


AR GMAIE DAR Cane RLS 








Prediction of Academic Success in Three Schools of Nursing 


Albert H. Ford 
Towson, Maryland 


There are various reasons why nurse trainees 
withdraw from schools of nursing; paramount 
among them is academic failure. In a study 
by Horner (1), reported by Potts (3), approxi- 
mately 37 per cent of a group of more than 
15,000 students admitted to schools of nursing 
over a period of years were eliminated prior to 
completing their courses. Of those eliminated 
in two years’ classes, about 30 per cent with- 
drew because of academic failure. Potts (3) 
also found classroom failure the largest single 
reason for withdrawals from schools of nursing. 

Perhaps more significant than the total 
percentage of withdrawals from schools of 
nursing is the percentage of withdrawals in the 
early stages of training. Horner (1) found 
that 63 per cent of those withdrawing did so 
in the preliminary period of training; 84 per 
cent had left the school by the’ end of the first 
year. Potts (2) points out that in a particular 
group of 1,555, approximately 90 per cent of 
the eliminations for classroom failure came 
within the first six months of their course. 

It can be concluded from the foregoing that 
academic failure is one of the principal reasons 
for withdrawal from schools of nursing, and 
further, that failures occur early in the school’s 
program. If we assume that better selection 
techniques are able to predict in advance an 
applicant’s likelihood of academic success, then 
these techniques will save much of the expense 
ordinarily incurred by the withdrawal of un- 
successful trainees. 

It was for reasons such as these that the 
nurse training supervisors of three Knox- 
ville, Tennessee, training schools decided to 
solicit the aid of the University of Tennessee 
in the development of a more efficient selection 
program. 

Statement of the Problem 


Although the general problem involved was 
twofold—the determination of scholastic suc- 
cess, and also success in ward training or on- 
the-job training—this study was concerned 
only with the former phase. Its specific 


purpose was to determine the extent to which 
scholastic success in Knoxville hospital train- 
ing programs could be predicted from a battery 
of tests administered to entering groups. 


Subjects 


The subjects used in this study were trainees 
accepted in the training programs of the East 
Tennessee Baptist Hospital, the Fort Sanders 
Hospital, and the Knoxville General Hospital. 
The original sample consisted of 187 trainees 
from six groups admitted to the schools without 
regard to the scores made on the selection tests. 
Each hospital contributed from one to three 
groups. With the exception of one group, all 
trainees were tested within 30 days of their 
acceptance into training. The other group had 
completed from one-half to one year of its 
particular program. 

The school entrance requirements are such 
that applicants must be females between the 
ages of 17 and 35. They must have graduated 
from an accredited high school with 16 units 
of credit and be eligible for admission to the 
University of Tennessee. 


The Variables 


The original battery of predictors included 
the following measures: 


1. George Washington University Series 
Reading Comprehension Test tor Pro- 
spective Nurses; Form 1, First Edition; 
by Thelma Hunt. 

. George Washington University Series 
Arithmetic Test for Prospective Nurses; 
Form 1, First Edition; 1940; by Thelma 
Hunt. 

. American Council on Education Coop- 
erative General Science Test; Revised 
Series, Form X; by Paul E. Kambly. 

. Science Research Associates Primary 
Mental Abilities; 1948; by L. L. Thur- 
stone and Thelma Gwinn Thurstone. 


186 





Prediction of Academic Success in Three Schools of Nursing 


5. Kuder Preference Record; Science Re- 
search Associates; Form BB, 1942. 
6. High School Point Average. 


The criterion for the study was the paint 
average of all scholastic grades earned by the 
trainee up to March, 1949, either at the 
hospital or on the University of Tennessee 
campus. If the trainee left the hospital prior 
to that time, her average was based on all 
courses completed prior to withdrawing. Each 
trainee took all of the tests; high school aver- 
ages were available for all trainees except 
one who was admitted on the basis of Vet- 
erans Administration high school proficiency 
examinations. 

Procedure 

Since the number of cases available (N = 187) 
did not justify the inclusion of all the separate 
measures, the most promising in terms of 
predictive power were sought on the basis of 
the validity coefficients computed for the first 
two groups. From these correlations it seemed 
likely that the best measures to include in the 
final battery were: Reading Comprehension 
Test; Arithmetic Test; Cooperative General 
Science; Total Primary Mental Abilities; High 
School Point Average; and the Science and 
Social Service scales of the Kuder Preference 
Record. 

The validity coefficients of the various Kuder 
scales were both small and inconsistent. The 
largest obtained was .34 between the Scientific 
Scale and the scholastic point averages of the 
group; however, the second group revealed a 
coefficient of only .005. The Scientific and 
Social Service scales were retained in the final 
battery more on the basis of reports in the 
literature and the significant group centile 
ratings on the measures than for their validity 
coefficients. From the validity coefficients of 
the subtests of the Primary Mental Abilities, 
it did not appear that there was any additional 
predictive power to be gained beyond that 
provided by the total of that measure. 

Similarly, in terms of the total sample 
available, it was considered advisable to com- 
bine the various groups from the three hospi- 
tals, thus considering them as derived from a 
homogeneous population. To ascertain if any 
differences did exist between the groups which 
would preclude treating them collectively 


187 


rather than as separate hospital groups, an 
analysis of variance of the differences between 
groups was made on the variables to be in- 
cluded in the final battery. 

The F-ratios revealed that very significant 
differences existed on the Reading Compre- 
hension and Kuder Social Service, differences 
which were not likely to occur by chance alone 
one time in a hundred. The F-ratio found 
on the reading test was most striking and was 
the impetus for further analysis to determine 
the particular areas of greater difference. 

As was pointed out earlier, the students 
comprising one group had been in training 
from one-half to one year prior to taking the 
battery of tests, whereas the other groups 
were tested within 30 days of their admittance 
to the program. By observing the F-ratios 
depicting the within-group and between-group 
variances of all six groups, it became apparent 
that half of the total between-group variance 
was being contributed by the trained group on 
the reading test as well as on the arithmetic 
test. It seemed likely that by omitting the 
trained group, much of the variance in the 
reading test as represented by an F-ratio of 
5.31 would be eliminated. 

With the trained group eliminated, none of 
the F-ratios except the one for the scientific 
scale of the Kuder Preference Record were 
significant at the 1 per cent level of confidence 
with 4 and 141 degrees of freedom. Four of 
the measures, including the reading test, did 
not reveal differences significant at the 5 per 
cent level of confidence. The Cooperative 
General Science Test and the Social Service 
scale of the Kuder Preference Record revealed 
differences at the 5 per cent level, but not at 
the 1 per cent level. The F-ratio for the 
science test was 2.57, with a ratio of 2.44 being 
significant at the 5 per cent level with 4 and 
141 degrees of freedom. 

From the above, it was concluded that, with 
the exception of the Kuder Scientific scale, 
which was later eliminated from the battery, 
the differences which existed were not of a 
magnitude which would preclude combining 
the groups from different hospitals, thus re- 
garding them as derived from a homogeneous 
population. When we consider that all of the 
hospitals have similar academic programs, that 
their requirements for entrance are similar, 

















Albert H. Ford 


Table 1 


Criterion Correlations and Intercorrelations between Variables and the Mean and 
Standard Deviation of Each Variable 
Note: N = 137 except in variable High School Point Average where N = 136. 





Test 


Reading Comprehension 54 56 
Arithmetic AO 
A.C.E. Science 

. Total P.MLA. 

Scientific (Kuder) 

. Social Service (Kuder) 

. High School Point Average 
Hospital Training Pt. Average 


ee ne 


and further, that they admit applicants from 
approximately the same area, it seems reason- 
able that such influences would assure that the 
applicants were derived from a_ reasonably 
homogeneous population. 

Thus, after eliminating the trained group 
from the over-all sample, there remained five 
groups with a total of 146 individuals. 

The next phase of the study consisted in 
determining the criterion correlations and the 
intercorrelations of each of the final measures of 
the five untrained groups. 


Table 2 


Table of Beta Weights and Coefficients of Multiple 
Correlation Showing Successive Changes as 
Measures Are Eliminated from 

the Battery 


Beta Weights Multiple R 
04 
242 
286 
06D 
O14 


304 
246 
286 


Test and Battery 


Coop. Science 
Reading Comp 
High Sch, Pt. Av. 
Total P.M.A 
Arithmetic 

Coop. Science 
Reading Comp 
High Sch, Pt. Av. 
Total P.M.A 
Coop. Science 
Reading Comp 
High Sch. Pt. Av 
Arithmetix 

Coop. Science 
Reading Comp. 
High Sch. Pt. Av 


—S——————————— 


56 — 13 ‘ 38 56 


Read Arith Scn PMA Sen(K) S.S(K) HSPA HTPA Mean 


62.9 
57 — .02 04 33 39 249 
47 —.12 — .@ 33 57 
—.14 — 03 46 48 
14 00 


Table 1 gives the product moment correla- 
tions between the variables, with the mean and 
standard deviation of each variable. Both 
scales of the Preference Record yielded very 
low validity correlations; in fact, these two 
measures correlated neither with themselves 
nor with any of the other measures to an extent 
greater than .14. It was at this point that the 
Scientific and Social Service scales were 
dropped from the battery. 

Table 2 gives the beta weights and multiple 
correlation coefficients for various combina- 
tions of the final measures, showing successive 
changes as measures with lower criterion cor- 
relations are eliminated from the battery. 

It was concluded from Table 2 that the most 
practical regression equation for the prediction 
of scholastic point averages for students from 
the general population concerned in this study 
would include the reading test, the science test 
and the high school point average, which gives 
a multiple R of .697 as compared to an R of 
.699 for all measures. The other measures, 
although giving criterion correlations as high 
as .48 (Table 1), do not add sufficiently to the 
predictive power of the battery to justify the 
time and expense involved in administering 
and scoring the tests. 


Shrinkage of the Multiple R 


When the coefficient of multiple correlation 
is determined from a given set of data, as 
above, and is applied to a second set of data, 
the yield in the latter case will be less, even 
though the second set of data is strictly com- 
parable. This shrinkage of the multiple R 





Prediction of Academic Success in Three Schools of Nursing 189 


varies with the number of variables contained 
in the regression equation, the number of 
cases, and the size of the coefficient of correla- 
tion. A shrinkage-deduction formula has been 
devised by Smith (4) to apply to the coefficient 
of multiple correlation which provides a more 
accurate estimate of the multiple R. When this 
.urmula is applied to the present data a multiple 
R of .686 is forthcoming, indicating that shrink- 
age to the extent of .011 can be expected when 
the regression equation is applied to a second 
group. 

The formula for the most economical pre- 
diction of the criterion in terms of raw scores 
was: 


Hospital Training School Average 
= .017 Xpeas.+-029 Xgo.+.420 Xu.s.p.a.— 87. 


Conclusions 


On the basis of the study the following con- 
clusions seem to be justified: 


1. The science test, reading test and high 
school point average were fairly effective in 
predicting success in the schools of nursing, 
the multiple correlation coefficient being .697. 

2. Although the arithmetic test and Total 
P.M.A. correlated rather well with the criter- 
ion, .392 and .479 respectively, neither of these 
tests, either separately or together, added ap- 
preciably to the predictive power of the battery. 
Increases in the multiple R were significant 
only in the third decimal place. 

3. The sub-tests of the P.M.A., in general, 
correlated less highly with the criterion than 
did the total of that measure, as evidenced by 
validity coefficients computed for two early 
groups. 

4. Although the averages of the total group 
showed its members to be more interested in 


science and social service, as measured by 
those two scales of the Kuder Preference 
Record, than women in general, neither scale 
contributed anything to the predictive power 
of the battery. The validity coefficients were 
as follows: Scientific, .000; Social Service, .010. 

5. Significant differences between groups 
from different hospitals appear when one group 
has had one-half to one year’s training prior 
to taking the tests and the remaining groups 
are tested within 30 days of their acceptance 
into the hospital’s training program. Such 
differences were most striking on the Reading 
Comprehension test; the F-ratio for that test 
being 5.31 with the trained group included in 
the total population. Although such an 
F-ratio indicates differences between the groups 
which are significant at the 1 per cent level of 
confidence, with the trained group eliminated, 
existing differences are not significant at the 
1 per cent or 5 per cent levels of confidence. 

6. Had the regression equation forthcoming 
from this study been used in selecting the 
trainees participating in the study, 13.1 per 
cent of them would not have been admitted 
(assuming 1.50 as the hospital critical score 
and the score one P.E. below 1.50, or 1.08 as 
the cutting score). 


Received August 25, 1949. 


References 


1. Horner, H. H. Nursing education and practice in 
New York State with suggested remedial measures. 
Albany: University of the State of New York, 
1934, pp. 38 

2. Potts, Edith Margaret. The selection of student 
nurses. Amer. J. Nurs., 1941, 41. 

3. Potts, Edith Margaret. Use of tests in selecting 
student nurses advantageous to hospital and 
student. Hospital Mgmt., 1941. 

4. Smith, B. B. Forecasting the acreage of cotton. 
J. Amer. statist. Ass., 1925, 20, 31-47. 








AQAE BOPL NILA 


nannies th 








Critical Requirements for Dentists * 


Ralph F. Wagner 
American Institute for Research and Department of Psychology, University of Pittsburgh 


An attempt to improve selection methods at 
the School of Dentistry led to the discovery 
that no systematic investigation had been 
carried out to determine the characteristics 
either of a successful dental student or an 
effective practicing dentist. Since this infor- 
mation is important for the development of 
personnel procedures, research was undertaken 
at the University of Pittsburgh to obtain a 
precise and practical definition of requirements 
for the profession. 

The method employed is called the critical 
incident technique. Information on the pro- 
cedure has only recently appeared in the litera- 
ture.’ It was developed in order that a com- 


prehensive list of behaviors of the kind which 
make the difference between success and failure 
in an activity or profession might be obtained. 
Persons in or normally associated with the 
profession and who are considered qualified to 
judge competency with respect to one or more 


phases of the job are asked to describe the most 
recent incident they observed in which a par- 
ticipant carried out a part of the job either in 
a particularly effective or ineffective manner. 
They are asked to describe the situation, the 
relevant circumstances and exactly what the 
participant was observed to do. No inter- 
pretation is requested regarding abilities, apti- 
tudes, motivations, and attitudes, which might 
have been responsible for the behavior. 

After a large number of incidents are ob- 
tained, an analysis is made to determine the 
specific behavior which caused the observer to 
judge an individual as effective or ineffective. 
A grouping and structuring of these behaviors 
result in: (1) a list of those aspects of the job 
which are “critical” in the sense that they 

* The study was conducted in close cooperation with 
Dr. L. E. Van Kirk and Dr. W. F. Swanson, School of 
Dentistry, and Dr. J. C. Flanagan, Department of 
Psychology. Dr. Wagner, who carried out the research, 
is Project Director with the American Institute for Re- 
search and Lecturer in the Department of Psychology. 

' Flanagan, J.C. Job Requirements, From Dennis, 


W. (Ed), Current Trends in Industrial Psychology. 
Pittsburgh, University of Pittsburgh Press, 1949, 


caused an observer to make a judgment of job 
effectiveness; and (2) a series of specific state- 
ments of the contrasting ways in which effec- 
tive and ineffective job participants behave in 
carrying out these aspects. The “critical 
requirements” of the occupation are then 
derived through an analysis and a summariza- 
tion of these contrasting ways of carrying out 
each aspect. 


The Present Study 


Incidents were obtained from three sources: 
(1) patients; (2) dentists themselves; and (3) 
instructors in dental school clinics. Patients 
were not expected to supply data regarding 
the technical aspects of dentistry; this infor- 
mation was to be obtained from dentists and 
instructors. As expected incidents supplied by 
patients dealt more with personality, manner, 
business practices, appearance of office, and 
similar factors. Clinic instructors provided 
particularly useful data for determining re- 
quirements both for success in dental school 
and for effectiveness in general practice. An 
instructor has an opportunity to observe dental 
practice being carried on by persons often 
varying considerably in skill and is in a position 
to know the important details when behavior 
which is critical occurs. They correspond to 
supervisors who have, in other studies, proved 
to be a source of particularly useful data. In 
the case of practicing dentists there is no indi- 
vidual in a position comparable to that of an 
instructor or supervisor. In most cases only 
the dentist who was doing the actual work is 
acquainted with all relevant circumstances: 
Dentists were therefore requested to describe 
incidents concerning their own practice. 

In securing incidents from all three sources— 
patients, dentists, and instructors—either the 
word “effective” or “competent” was used in 
referring to the desirable type of dentist. 
These words were purposely used in place of 
“successful”? because of the monetary connota- 
tion of the latter. Moreover, it was requested 





Critical Requirements for Dentists 


that the identity of the individual whose per- 
formance had been observed not be given 
either in effective or in ineffective incidents. 

Patients were requested to describe two 
kinds of incidents—those in which the dentist's 
performance had caused them to recommend 
him enthusiastically to a friend and those in 
which his performance had caused them to 
change or consider changing to a new dentist. 
Dentists were also asked to describe such inci- 
dents in which it had been their own perfor- 
mance which was responsible for the patient's 
action. In addition, however, they were 
asked for incidents in which the patient was 
unaware that particularly effective or ineffec- 
tive performance had occurred but which 
nevertheless had caused the dentist himself 
either to feel a great deal of professional satis- 
faction or to feel that he would perform more 
effectively if given a second opportunity. 
Clinic instructors were asked to describe in- 
cidents in which they had observed a student 
perform in either a particularly effective or 
ineffective manner. Particularly effective per- 
formance was defined as performance the in- 
structor might wish to cite in the classroom, 
insist that all students copy, or the kind which 
would contribute significantly to the student's 
effectiveness if he were in practice. Ineffective 
performance was defined as the kind which, 
if it occurred repeatedly, or even once under 
certain circumstances, would cause the in- 
structor to doubt seriously the student’s prob- 
able effectiveness in practice. 

A total of 781 incidents were obtained,-—257 
from patients, 359 from dentists, and 165 from 
clinic instructors. One of the most interesting 
findings of the study was the enthusiastic 
manner in which dentists participated. There 
was some concern at first as to how willing an 
individual would be to describe ineffective 
incidents concerning his own performance. It 
was found, however, that dentists gave such 
incidents as fully as they gave effective inci- 
dents. The following example is typical. 

About six months ago a boy of sixteen for 
whom I had done several fillings came in for 
his regular weekly appointment. I was rushed 
and did a rapid filling for him, not being careful 
about the depth of cavity preparation. No 
cement base was placed. Patient returned 
several days later complaining of severe tooth- 
ache. I removed filling and found pulp ex- 


191 


posure. Refilled tooth with sedative cement. 
Patient did not return but a dentist friend said 
he extracted the tooth for the patient several 
weeks later. 


Another example will further illustrate the 
nature of the data collected in the study. This 
incident was obtained from a clinic instructor 
and should be of interest to dental educators. 


Student came to me with a gold inlay filling. 
He told me filling was not good and he would 
like to do the work over. On examining the 
filling in the mouth I found the filling was not 
quite as bad as the picture he had given me 
and could very possibly have been made satis- 
factory. However this student showed both 
the knowledge and was conscientiously inter- 
ested in the patient’s welfare and his own work 
to repeat the work. Most students are satis- 
fied with only fair work and do not generally 
want to repeat it to make it better. 


An analysis of the incidents from all three 
sources indicated that there were four main 
aspects in serving as a general practitioner. 
The following titles seemed descriptive of these 
aspects: I. Demonstrating Technical Profi- 
ciency; II. Handling Patient Relationships; 
IIL. Accepting Professional Responsibility; and 
IV. Accepting Personal Responsibility. 

The critical behaviors under each of the 
above main areas were grouped into sub-areas. 
The behaviors under Area I were grouped 
according to the specific treatment being ren- 
dered. Development of sub-areas within the 
other major areas, however, was accomplished 
on logical grounds, guided by the actual nature 
and distribution of the behaviors. Although 
the behaviors in these areas were not related 
to any specific type of treatment, it was found 
that they could be grouped into relatively dis- 
crete sub-areas. 

As a means of summarizing the content of 
the incidents, definitions of major areas and 
sub-areas were written. These definitions pro- 
vided a detailed description of the nature of 
the dentist’s work and responsibilities as de- 
rived from the 781 incidents analyzed in the 
study. Asa second means of summarizing the 
content of the incidents, a tentative group of 
40 “critical requirements” were defined. They 
consist of a series of statements which express 
the specific way an outstanding dentist per- 
forms in the important situations which are 
characteristic of his profession. Some indi- 














Ralph F. Wagner 


Table 1 


Distribution of Critical Behaviors Among the Four Major Categories 








Technical | 
Proficiency 


Source 


Patient 
Relationship 


Professional 


Responsibility | Responsibility Total 





Patients 107 168 
Dentists 138 160 
Instructors 65 36 


6 94 375 
37 95 430 
15 59 175 





cation of the relative importance amorig the 
40 critical requirements is provided by the 
frequency of their occurrence in the incidents 
from each of the three sources.” 

Although it is impossible to present the 40 
critical requirements and their frequencies in 
the present paper, the number of critical be- 
haviors falling into each of the four major 
categories, broken down according to the 
source of the incident, is shown in Table 1. 
These frequencies exceed the number of inci- 
dents collected in the study since an incident 
often contained more than one critical behavior. 


Summary 


The present study furnished information re- 
garding the critical aspects of dental practice. 
Its purpose was mol to produce a curriculum 
for training dental students. The study does, 
however, furnish information on the specific 
kinds of dental practice which have frequently 
made the difference between effectiveness and 
ineffectiveness both in practice and in dental 
school clinics. The results indicate the areas 
which most frequently cause difficulty while the 
student is in the clinic and after he has gone 
into practice and provide information on the 
type of practice which the patient, the dentist, 
and the clinic instructor consider particularly 
effective. The conclusions which these results 
suggest are as follows: 


1. The requirements for effectiveness in den- 
tistry are complex. They are not confined 
alone to the demonstration of technical 


? Persons interested in a fuller description of the 
study, including the tentative statements of critical 
requirements and definitions of areas and sub-areas, 
should write to the School of Dentistry, University of 
Pittsburgh, Pittsburgh 13, Pa. Also, this information 
may be ordered as Document No. 2826 from American 
Documentation Institute, 1719 N Street, N.W., Wash- 
ington 6, D. C., remitting $.50 for microfilm (images 1 
inch high on standard 35 mm. motion picture film) or 
$1.10 for photocopies (6 X 8 inches) readable without 
optical aid, 


proficiency. Although the critical nature of 
this area is strongly supported, there are 
non-technical behaviors which are also criti- 
cal to effectiveness. “Handling Patient Rela- 
tionships” is a particularly important area. 
Other critical areas, to use the titles adopted 
in the present research, are ‘Accepting 
Professional Responsibility” and ‘Accepting 
Personal Responsibility.” 

2. Many of the characteristics which have 
commonly been accepted as important for 
effectiveness in dentistry must be re-evaluated. 
“Ability to converse on topics of the day,” for 
example, has been mentioned as important 
upon various occasions yet in the 781 incidents 
analyzed in the present study it appeared only 
once. The same was found to be true of 
“asking question when the patient is unable 
to answer.” And, although applicants have 
in the past been rated upon voice quality, this 
was not found to be a factor in any incidents. 
In comparison, “discussing treatment being 
planned or rendered” occurred in 61 incidents. 
In view of the complexity of requirements 
which the present study suggests, it would 
seem advantageous to concentrate on those 
which are critical. 

3. Critical behaviors revealed by the present 
study provide an additional basis for evaluating 
clinic performance. A form, developed in 
cooperation with persons experienced in dental 
education, for recording systematically ob- 
served occurrences of critical behavior would 
provide objective evidence on which to base 
judgments of effectiveness. These judgments 
would be closely related to the requirements of 
actual practice. As such they would reflect 
the adequacy of training, indicate eligibility 
for graduation, and provide the realistic criter- 
ion which ‘selection batteries might strive to 
predict. 


Received March 22, 1950. 
Early publication. 





The Intra-Individual Relationship Between Interest and Ability 


S. M. Wesley, Douglas Q. Corey, and Barbara M. Stewart 
University of Southern California 


An understanding of problems related to 
educational and vocational guidance has be- 
come increasingly important during the post- 
war years. Advisement of veterans, by ar- 
rangement with the Veterans Administration, 
alone has resulted in the setting up of centers 
which annually provide this service to many 
thousands of former servicemen. 

A question which has long been considered a 
primary concern in this field is that of the 
relationship between vocational interests and 
vocational abilities. Summaries of studies 
concerned with the problem are presented by 
Strong (4) whose review illustrates a trend in 
the direction of more adequate methodological 
procedures for determining the degree of this 
relationship. Early investigations compared 
various interests on a vocational interest test to 
overall ability as measured by a single criterion 
such as intelligence test scores or college grade 
point averages. The results of such studies, 
in general, indicated low or negligible correla- 
tions. An explanation for this seemed to lie 
in the fact that different interests were matched 
with a single general ability rather than 
with specific abilities corresponding to those 
interests. 

When studies were designed in which each 
interest was matched to a _ corresponding 
ability, higher correlations were obtained. A 
profitable approach (resulting in significant 
positive correlations ranging from .32 to .40) 
was that of Triggs (5) who compared interest 
scores of one hundred college men on the Kuder 
Preference Record to corresponding ability 
scores on the Jowa High School Content Ex- 
amination. However, the magnitude of the 
relationship shown by such studies continued 
to be surprisingly low. 

Since the ability scores used in Triggs’ study 
were based on deviations from the group means, 
the following question suggests itself: What is 
the relationship between interest and ability 
when scores for each individual represent de- 
viations from his own mean, rather than from 
the group mean? 


193 


Some evidence that this approach might 
result in higher correlations is given in the 
study of Segel (2). He compared interests, 
as measured by the Strong Vocational Interest 
Blank, with abilities as measured by the Jowa 
High School Content Examination. However, 
his method was unique in that he correlated 
interest scores with differences between two 
ability scores, and obtained higher correlations 
than those found between an interest and an 
absolute ability score in a corresponding area. 
For example, scores on the Engineer key corre- 
lated .57 with the difference between scores in 
Mathematics and scores on a History and Social 
Science test, while the same interest correlated 
only .49 with scores in Mathematics alone. 
Although the differences between these two 
types of correlation were not statistically 
significant (results were based on one hundred 
cases), they do suggest that the use of more 
than two abilities in relation to each other 
might point the way to even higher correlations 
than those previously shown. The writers 
felt that if a truly relative score were derived 
by first calculating a mean ability score for 
each individual, a greater relationship might 
be shown to exist. 


Procedure 


In order to investigate this hypothesis a 
study (1, 3) was conducted in which tests of 
interest and ability were administered to 156 
male college students enrolled in an introduc- 
tory psychology course. Since tests were 
administered on different days the number 
taking any one test was not constant and 
ranged from 115 to 132. The measute of in- 
terest used was the Kuder Preference Record 
which yielded scores in the following areas: 
mechanical, computational, scientific, artistic, 
literary, musical and clerical.' Measures of 
ability were selected to correspond to these 

' The persuasive and social service categories were 


not included because of the lack of adequate tests of 
ability in these areas. 











194 


interest categories as follows: Survey of Me- 
chanical Insight; Stanford Arithmetic Test; 
Iowa High School Content Examination, 
Section 3, Science; The Meier Art Judgment 
Test; Iowa High School Content Examination, 
Section 1, English and Literature; Seashore 
Measures of Musical Talents, Series A; 
and Minnesota Vocational Test for Clerical 
Workers. 

Total scores were computed for all tests and 
were equated by converting to standard scores. 
Pearson r correlations were first obtained be- 
tween interest and its corresponding ability in 
each of the seven areas. This was done in 
order to provide data based on the traditional, 
or inter-individual method, to which results 
of the new, or intra-individual method might 
be compared. 

To obtain correlations based on scores repre- 
senting deviations from the individual's own 
mean, rather than that of the group, the mean 
ability score for each subject was com- 
puted (in terms of standard score units). 
The differences between his separate ability 
scores and his own mean level of ability were 
next determined. These differences then be- 
came the measures of relative ability to be 
correlated with the corresponding measures 
of relative interest. Since the Kuder Prefer- 
ence Record is so constructed that every item is 
chosen at the expense of another, it was as- 
sumed that the resulting interest scores were 
approximately relative to the individual mean 
and that further manipulation of these data 
was therefore not necessary. Pearson r cor- 
relations between relative interest and relative 
ability, for each of the seven areas, were then 
computed. 

Since these correlations indicated the rela- 
tionship only for the group as a whole, it was 
considered desirable to investigate as well the 
variation in interest-ability agreement for indi- 
viduals within this group. Accordingly, for 
each of the one hundred students who had 
completed all tests and for whom complete data 
were therefore available, the standard scores 
for his seven areas of interest and ability were 
arranged in two rank-order sequences. Rho 
coefficients of correlation were then computed 
for each of the individual sets of scores. 


S. M. Wesley, Douglas Q. Corey, and Barbara M. Stewart 


Results 


There were thus obtained, first, seven 
interest-ability correlations based on devia- 
tions from group means, and seven correlations 
based on deviations from individual means. 
As may be seen from Table 1, these ranged 
from .07 to .47 (not corrected for attenuation) 
where group means were used, and from .23 to 
.68 where individual means were used. The 
seven “group mean”’ correlations were averaged 
(after transforming to Fisher s values) and 
resulted in a mean Pearson r correlation of .30. 
The average of the seven “individual mean” 
correlations was .42. The / ratio, based on the 
difference between mean z values, was 3.3 
which is significant above the 1 per cent level 
of confidence.* 

The 100 individual rank-order correlations 
ranged from —.57 to +1.00, and (by trans- 
forming to Fisher s values) a mean of +.46 was 
obtained. The difference between this mean 
and that of .30 for the “‘group mean”’ correla- 
tions also resulted in a ¢ ratio of 3.3, significant 
above the 1 per cent level of confidence. 

It was thus shown that significantly higher 
correlations are obtained by the use of the two 
methods presented here than by the more 
traditional method of comparing interests and 
abilities. It is to be expected that with the 
development of better tests, and with more 
exact matching of tests of interest and ability, 
an overall relationship in excess of that found 
here may be demonstrated. Although in every 
vocational area the correlation was increased 
when individual levels of interest and ability 
were used, in calculating the mean correlation 
substantial relationships in some areas were 
offset by low relationships in others. This 
variation in the correlations obtained may well 
be, in part at least, a function of variation in 
the extent to which each of the ability tests 
reflects experience as opposed to aptitude. It 
would be expected that a test which is more 
heavily weighted with the experience factor 
would show a higher relationship with interest 


? This difference is even more significant when it is 
considered that the formulas available for testing sig- 
nificance of differences between correlations assume 
that the figures are obtained from independent random 
samples. In this study, of course, the correlations 
were obtained from the same sample, so that any such 
test of significance would err on the conservative side. 





Intra-Individual Relationship Between Interest and Ability 


Table 1 


Correlations Between Interest and Ability Based 
on Deviations from Group and from 
Individual Means 


Individual Means 


N* r 


‘ 126 50 
Computational 7 : 112 AT 
Scientific J 126 35 
Artistic F 127 31 
Literary ; d 125 68 
Musical F 118 .23 
Clerical i 125 33 





A2 





* The size of N in each case depended on the number 
of subjects for whom necessary data were available. 


because of the correlation which we know to 
exist between interest and experience. An ex- 
amination of the correlations in Table 1 shows 
that those tests in which the experience factor 
would play a larger part, such as the Iowa 
Literary Test and the Survey of Mechanical 
Insight, have a higher correlation with interests 
than do those ability tests such as the Min- 
nesota Clerical and the Seashore Tests which 
have a lower weighting with the experience 
factor. However, there still exists the reason- 
able hypothesis that there is a genuine varia- 
tion in the degree of relationship between in- 
terests and abilities for different activity or 
vocational areas. 


The Meaning of Individual Differences 
in Interest-Ability Congruency 


The wide range in interest-ability congru- 
ency for different individuals (—.57 to +-1.00) 
raises what may be an important question for 
those concerned with vocational guidance: 
Why for some individuals is the correlation 
high and positive, while for others it is low or 
even negative? In order to explore this area, 
age, intelligence and personality factors were 
studied to determine whether they might be 
related to these individual differences. The 
measures of intelligence and personality, re- 
spectively, were the Army Alpha Examination, 
First Nebraska Revision, and the Minnesota 
Multiphasic Personality Inventory. An upper 


195 


and lower 25 per cent of the group were 
selected, based on those having highest and 
those having lowest interest-ability congru- 
ency as shown by the individual rank-order 
correlations. For these extreme groups a 
comparison was made of mean age and of mean 
intelligence scores, but the differences were 
were found to be insignificant. Similar com- 
parisons were made between mean scores ob- 
tained by the upper and lower groups for each 
of the nine categories of the Minnesota Mullti- 
phasic Personality Inventory. Although only 
one significant difference (for the Schizophrenia 
scale) was found, the group having highest 
agreement between interests and abilities 
showed scores on eight of these scales above 
those of the other group, and in a direction 
away from the level of “normal” adjustment. 
The meaning of this finding is not clear, but 
it appears to be related to a greater differentia- 
tion in interests and in abilities for those 
having a less adequate personality adjustment. 
It is felt that a different test of personality, 
concerned with basic character structure rather 
than with nosological groups, might reveal 
important differences between those indi- 
viduals whose interests and abilities are in 
agreement and those where marked deviations 
are found. It is possible that the use of pro- 
jective measures of personality would demon- 
strate such a relationship and contribute to an 
increased understanding of these individual 
differences in interest-ability congruency. 


Predictability of Extreme Ranks 


In computing the rank-order correlations 
between interest and ability for the individual 
subjects in the study, a tendency was noted for 
high ranking and low ranking ability areas to 
have greater predictability in terms of interest 
test scores than those ability areas in the 
middle of the individual’s range. 

Further analysis of the data was therefore 
made to determine the predictability of each 
interest rank from ability rank. A study of 
the rank-order sequence of interest and ability 
for each individual showed that for 31 per cent 
of the cases the highest interest fell in the same 
area as the highest ability. For 20 per cent of 
the cases, the second highest interest was in 
the same area as the highest ability. Con- 














196 


tinuing this procedure, the percentages became 
increasingly smaller so that for only 4 per cent 
of the cases was the seventh, or lowest, interest 
in the same area as the highest ability. Thus, 
for 51 per cent of the group, the first or second 
highest interest was in the same area as the 
highest ability. Chance would have permitted 
only 29 per cent, and predictability was thus 
22 per cent better than chance. 

Similar procedures were applied in studying 
the number of cases in which the second highest 
interest fell in the same area as the second 
highest ability, and so on throughout the seven 
different ranks. It was shown that prediction 
from ability to interest for rank one and for 
rank seven was better than prediction for 
ranks two through six. This is explained in 
part, of course, by the fact that, at the ends, 
the error can extend in only one direction, while 
in the middle it can vary in two directions. 

It is probably this end-effect phenomenon 
which has led vocational counselors to the 
clinical belief that there is a higher relationship 
between interests and abilities than has been 
shown by statistical studies, where the inter- 
vening ranks must also be considered and where 
overall accuracy of prediction is lowered ac- 
cordingly. In observing generally good agree- 
ment between highest interest and highest 
ability, and between lowest interest and lowest 
ability, counselors have to this extent at least 
considered the ordinal position of both, and 
have thus actually based guidance on the rela- 
tion of scores to the individual’s own mean 
rather than to that of the group. Although 
increased accuracy of prediction from extreme 
ranks may be largely a statistical artifact, the 
use of such prediction for purposes of vocational 
guidance would appear to be justified. 


Summary 


The purpose of this study was to investigate 
the relationship between vocational interests 
and abilities when the magnitude of test scores 
is relative to the individual’s own level, rather 
than to the group level of interest and ability. 

The procedure and findings were as follows: 


1. The Auder Preference Record and ability 
tests corresponding to seven of the Kuder 
interest areas were administered to 156 male 
college students. 


S. M. Wesley, Douglas Q. Corey, and Barbara M. Stewart 


2. A Pearson r correlation was first obtained 
between each interest and its corresponding 
ability for scores based on deviations from 
group means. The mean of these seven cor- 
relations was .30. 

3. Pearson r correlations between interest 
and ability were then obtained for scores 
based on. deviations from individual means. 
The mean of these seven correlations was .42 
which was shown to be significantly higher 
than the mean of .30 derived from the scores 
based on deviations from group means. 

4. Rank-order correlations between the seven 
interest areas and the seven ability areas were 
computed for one hundred individuals. The 
mean of these correlations was .46, which was 
shown to be significantly higher than the mean 
Pearson r correlation of .30 derived from scores 
based on deviations from group means. 

5. There was shown to be a wide range of 
individual differences in interest-ability con- 
gruency. Rank-order correlations ranged from 
~—.57 to +1.00. For the 25 per cent of the 
group having highest and the 25 per cent 
having the lowest interest-ability correlations, 
no significant difference was shown between 
mean age and mean intelligence. However, 
the group having the highest agreement did 
show a tendency to less adequate personal 
adjustment in that mean scores on eight of the 
nine scales of the Minnesota Multiphasic Per- 
sonality Inventory were higher than mean scores 
for the group having lowest interest-ability 
agreement. Only one of these differences, 
that for Schizophrenia, was significant. 


It was suggested that the use of projective 
techniques would be a useful approach in 
further study of the personality factors related 
to this individual variation in the congruency 
between interest and ability. 


6. For individual interest and ability scores 
arranged in rank order, it was shown that pre- 
diction from the extreme ability ranks, one 
and seven, to the corresponding interest 
ranks was much better than prediction from 
the intervening ranks, two through six. Al- 
though this is largely an artifact due to the 
end-effect, and not a true difference in efficiency 
of prediction, it has probably led vocational 
counselors to consider the extreme ordinal posi- 
tions of interests and abilities for a given indi- 





Intra-Individual Relationship Between Interest and Ability 


vidual, and to apply these findings in their 
vocational guidance work. Such a procedure 
would seem to be justified and of value when 
applied in this manner. 


Received August 31, 1949. 


References 


1. Corey, D. Q. A comparison of two methods of 
determining the relationship between vocational 
interests and abilities. Unpublished Master's 
thesis, University of Southern California, 1947. 


Bi sees etree mmrtetimni 
. r eet! “eb ra 


197 


Differential prediction of scholastic suc- 
cess. Sch. amd Soc., 1934, 39, 91-96. 

3. Stewart, Barbara M. A study of individual varia- 
bility in the relationship between interest and 
ability. Unpublished Master’s thesis, Univer- 
sity of Southern California, 1947. 

. Strong, E.K. Vocational interests of men and women. 
Stanford University, California: Stanford Uni- 
versity Press, 1943. 

. Triggs, Frances O. A study of the relation of Kuder 
Preference Record scores to various other meas- 
ures. Educ. psychol. Measmt., 1943, 3, 341-354. 


2. Segel, D. 








A Projective Test for Vocational Research and Guidance 
at the College Level * 


Robert B. Ammons 
University of Lowisville 


Margaret Newman Butler 
Colorade Woman's College 


and 


Sam A. Herzig 
University of Denver 


In their professional work the clinical psy- 
chologist and vocational counselor are often 
called upon to relate for a given individual 
problems associated with occupation to under- 
lying personality structure. This need to re- 
late personality and vocations has been dis- 
cussed at some length by Bixler and Bixler (7), 
Darley (10), Kilby (14), and Trabue (23). As 
a practical matter, it is often necessary to 
recognize stated vocational problems as symp- 
tomatic of deeper personality imbalances, and 
to attempt to treat these more fundamental 
disturbances prior to attempting to find solu- 
tions for the vocational problems. 

Berkshire, Bugental, and Cassens (6) in a 
survey of tests used in guidance centers report 
that by far the most frequently utilized per- 
sonality instruments are of the paper and pencil 
type, the Bell Adjustment Inventory, the Min- 
nesota Multiphasic, and the Bernreuter Person- 
ality Inventory. The known clinical inade- 
quacy of these tests is such that no comment is 
necessary here. The Rorschach is reported 
as being used in only 20 per cent of the guid- 
ance centers in the sample, and the Thematic 
Apperception Test is not even listed among the 
79 tests estimated to be most frequently used. 
Although formal projective devices are appar- 
ently not widely used in guidance centers, they 
are accepted as being valid for general clinical 
use, and are sometimes used in conjunction 
with counseling (2). 

* Plates and manual (1) can be obtained from R. B. 
Ammons. Thanks are due Professor R. A. Irwin of 
the University of Nevada, Professor R. B. Winn of 
Monmouth College, and Mrs. Carol Ammons of the 
University of Louisville for critical reading of this 
article and many helpful suggestions. 


In his comprehensive review of projective 
techniques, Bell (4) mentions only a few syste- 
matic attempts to link projective methods of in- 
vestigating personality and vocational problems. 
Shagass (20) reports the use of word association 
tests in Canadian pilot selection. The Ror- 
schach has been used for job screening (3), in- 
vestigation of personality structures associated 
with particular occupations (12, 18, 19), and 
selection of mechanical workers (17). Tom- 
kins (22) devotes a chapter of his manual for 
the TAT to personality diagnosis with respect 
to work and vocation setting. Projective tests 
have not been used extensively in vocational 
guidance for a number of reasons, among them 
being the training and experience necessary for 
interpretation, the “excessive” time require- 
ment, and the usual difficulty in relating 
personality structure to vocational problems 
in a really useful way. The importance of this 
latter limiting factor is well illustrated in a 
study by Kurtz (15) who found that even a 
specially constructed scoring system did not 
permit a satisfactory prediction on the basis of 
Rorschach scores of the success or failure of 
sales managers. 

A more direct approach to vocational guid- 
ance problems has been made with interest 
tests. Berkshire, Bugental, and Cassens (6) 
report the Kuder Preference Record, the 
Strong vocational interest blanks, and the 
California Test Bureau’s Occupational In- 
terest Inventory to be the most frequently 
used. All three tests call for rigidly cate- 
gorized answers, and scores are interpreted 
either on a strictly empirical basis in terms of 
patterns for people already engaged in an 


198 





A Projective Test for Vocational Research and Guidance 


occupation (Strong, Kuder) or a rationalistic 
a priori basis (California). Although tests of 
this type work fairly well in practice, they 
suffer from many weaknesses: (a) the client 
does not indicate reasons for his preferences; 
(b) interpretation tends to underemphasize the 
relationships of the obtained interest patterns 
to the total personality (8, 14); (c) subtleties of 
feeling cannot be expressed in terms of the 
categorized prepared answers; (d) little per- 
tinent qualitative information can be gained by 
observation during testing; and (e) several 
studies (9, 13, 16, 21) have shown that 
answers to such tests can be ‘‘slanted”’ almost 
at will. 

Briefly, it can be generalized that present 
projective tests do not satisfactorily relate 
personality to vocational problems, and present 
interest inventories do not relate vocational 
problems to personality. Berdie (5) in his 
review of factors related to vocational interests 
mentions several studies which report statis- 
tically significant relationships between paper 
and pencil personality tests and vocational 
interest tests. Such information is interesting 
and provocative. What seems to be needed 


is a test which combines the specificity of 


content now found in the interest inventories 
with the flexibility and depth inherent in pro- 
jective tests, and gives information as to the 
client’s interests, their origins, their function 
in the general personality structure, and how 
they are likely to affect behavior in the future. 


Problem 


The purpose of this study was to construct 
a projective test that would measure vocational 
attitudes and interests, and at the same time 
give information concerning related psychologi- 
cal forces operating within the individual's 
personality. To accomplish this, the following 
steps were undertaken: (a) construction of 
drawings of vocational situations which would 
meet requirements of neutrality, vagueness, 
and disguised purpose, which nevertheless 
would evoke a significant variety of responses 
related to vocations; (b) devising of a scoring 
system to identify and objectify significant 
aspects of responses; (c) ascertaining of the 
reliability of the scoring system; and (d) 
estimating the validity of the obtained test 


199 


scores by determining their capacity to dis- 
tinguish between groups of known composition, 
and their consistency with personal data and 
results from other tests. 


Procedure 


Materials: The Vocational Apperception Test' 
consists of 18 line drawings, 8} 5} inches (1). 
An individual engaged in a specific occupation 
is shown in each. Ten of these were designed 
for administration to women. To facilitate 
identification, the main figures on these 10 
plates are women. The remaining 8 plates 
were structured for men, with corresponding 
central male figures. 

The ten occupations depicted for women are: 
(a) laboratory technician, (b) dietician, (c) 
buyer, (d) nurse, (e) teacher, (f) artist, (g) 
secretary, (h) social worker, (i) mother, and (j) 
housewife. The plates for the eight men’s 
occupations show: (a) teacher, (b) executive or 
office worker, (c) doctor, (d) lawyer, (e) engi- 
neer, (f) personnel or social worker, (g) sales- 
man, and (h) laboratory technician. The 
occupations shown were arbitrarily chosen 
because of ease of representation and represen- 
tativeness of occupation. 

In drawing the plates, certain rules were care- 
fully followed. Scenes dealt as specifically as 
possible with one particulir occupation or 
type of occupation. Facial expressions and 
posturings were ambiguous in feelings ex- 
pressed. Line drawings were used, since they 
were felt to be less realistic than photographs, 
less structured, yet could emphasize the 
desired aspects of an occupational situation 
quite adequately. 

Before the final sets of plates were drawn, 
five preliminary plates for five occupations 
were used to test 15 college men and women 
informally. Results from this preliminary 
testing and accompanying interviewing were 
used to work out the final testing procedure. 
The test results and discussion indicated that 
occupational representations should be made 
less ambiguous and content simpler. This 
lead was followed in drawing the final set of 
18 plates. 

Subjects: 40 female and 35 male subjects (Ss) 
were selected for the two experimental groups. 


' Henceforth to be referred to as the VAT. 














200 


Women Ss were sophomores at Colorado 
Womans’ College to whom 1946 edition Strong 
Vocational Interest Tests for Women (Re- 
vised) previously had been administered as a 
part of a general testing program for all 
students. Answers were machine scored for 
all scales and records showing only one “A” 
scale rating were segregated from those of the 
entire sophomore class. Inasmuch as a large 
proportion of Ss had “A” housewife ratings, in 
addition to other primary patterns, it was 
necessary in some cases to disregard the house- 
wife scores. However, Ss selected for the 
housewife criterion group had no “A” ratings 
except for housewife. Five women were 
randomly selected from each of 8 occupational 
categories thus set up: laboratory technician, 
dietician, buyer, nurse, teacher, artist, secre- 
tary, and housewife. 

The 35 men who participated were junior, 
senior, and graduate students at the University 
of Denver. Five advanced students majoring 
in and feeling a primary interest in each of the 
7 following fields were chosen? teaching, ac- 
counting, premedical training, engineering, 
social work, salesmanship, and art. All were 
given the 1938 Strong Vocational Interest 
Blank for Men (Revised). Answers were 
machine-scored for all men’s scales. 

In the selection of the samples there was no 
direct control of age or previous occupational 
experience. 

Testing: The VAT was individually admin- 
istered, responses were recorded verbatim, and 
manner of responding noted. Es and Ss were 
matched as to sex, male testing male, and 
female, female. The following instructions 
were given: 

“The purpose of this test is to find out how 
people go about understanding human be- 
havior. You probably realize that insight 
into others’ behavior helps us get along with 
them. Ordinarily, when we meet people, we 
try to ‘size them up.’ We do it all the time. 
I am going to show you some cards and your 
job will be to tell me a story about the people 
pictured on these cards. On each card, will 
you tell me how the person came to be in this 

* These fields were chosen because students in them 
were available. They are those represented on the 
plates, except that law students and laboratory tech- 


nicians were omitted, and art students added to the 
total group. 


Robert B. Ammons, Margaret Newman Builer, and Sam A. Herzig 


situation, how he (she) feels about it, and what 
the future holds in store for him (her). Often, 
it will be necessary to use your imagination.” 

Following completion of the VAT, male Ss 
were asked what they thought was being 
measured. At all times Es’ comments were 
guarded so as not to indicate the purpose of 
the test, to minimize conscious “slanting” of 
stories by the Ss. 

Scoring: It has already been mentioned that 
scores were available for all Ss for all scales of 
the Strong vocational interest blanks. The 
real need was for a method of scoring VAT 
responses. It was decided to score for general 
attitude toward a specified occupation (using 
S’s report and picture content to decide sub- 
jectively and arbitrarily which occupation), 
reasons for entering an occupation, conflict 
areas or areas of concern, and vocational or 
personal outcomes. The following outlines 
cover the scoring system as used by Es and 
other judges. Scoring consisted of categoriz- 
ing each response under one or more of the 
sub-headings in each main scoring area. 

A. General attitude toward an occupation 
(indications from verbalization or judged feel- 
ing tone): 1. Like! 2. Like. 3. Indifferent. 
4. Dislike. 5. Ambivalence. 

B. Reasons for entering occupations: 1. Jn- 
terest and enjoyment. 2. Ability. 3. Status. 
4. Income. 5. Opportunity. 6. Security. 7. 
Altruism. 8. Contact with people. 9, Seclu- 
sion. 10. Forced into. 11. Excitement or 
curiosily. 12. Experiment. 13. Independence. 
14. Transfer of training. 15. Temporary em- 
ployment. 16. Training. 17. Contact with 
field. 18. Desire to influence others. 19. 
Idealism. 20. Desirable conditions of employ- 
ment. 21. None stated. 

C. Areas of conflict or concern: 1. Personal 
conflict (generalized): a. achievement; b. affilia- 
tion; c. aggression; d. inadequacy or inse- 
curity; e. independence; f. recognition; g. valu- 
ative; 2. Home and parental conflict. 3. Marital 
conflict. 4. Financial conflict. 5. Educational con- 
flict. 6. Vocational conflict. 7. Health conflict. 
8. No conflict mentioned. 

D. Vocational and personal outcomes: 1. Suc- 
cess. 2. Continues in field: a. mediocre success; 
b. transfer to better job within field; c. success 
of no importance; d. no mention of success; e. 
dissatisfaction in field; f. secures additional 





A Projective Test for Vocational Research and Guidance 


training. 3. Leaves field. 4. Not clearly slated. 
5. Confusion. 6. Disaster. 7. Continuous dis- 
satisfaction. 

For Women Only: 8. Marry, but continue in 
field. 9. Marry individual who has an allied 
interest in field. 10. Marry and leave field. 

A more detailed account of scoring pro- 
cedures and criteria can be found in the 
manual for the VAT (1). 


Results 


Reliability of scores: When Es first attempted 
scoring with only the outline of the present 
scoring system, the percentage of rescoring 
self-agreement ranged from 66 to 75, calculated 
in terms of the number of times the stories were 
categorized in the same way in the various 
scoring areas. This was not felt to be satis- 
factory, so more precise criteria for differentia- 
tion were worked out. After a great deal of 
discussion of scoring problems coupled with 
repeated scorings of the same records, stand- 
ards were set up (1). 

Ten protocols were then chosen at random 
from the as yet undiscussed records and each 
of two Es scored five independently on two 
occasions a week apart. Rescoring self-agree- 
ment was now 86 per cent for all protocols and 
all scoring categories taken together. Agree- 
ment varied little for the various scoring as- 
pects (general attitude, areas of conflict, out- 
comes, reasons for entering) and there was 
practically no difference in self-agreement with 
with male and female records. Inspection of 
the records showed no significant difference be- 
tween these experienced scorers in reliabilities. 

Since it was clearly possible for experienced 
judges (2Es) to attain a high level of personal 
consistency in scoring, a check on inter-person 
consistency was made. Four graduate stu- 
dents with experience in TAT scoring*® were 
given two randomly selected protocols each 
and a set of scoring instructions. General 
scoring methodology was explained, and any 
questions about the scoring categories as given 
in the instructions were answered. The re- 
sulting scorings were compared with previous 
scorings of the same records by the two ex- 
perienced Es. Although the inter-person con- 

* Thanks are due Miss Janet Ambler, Mrs. Helen 


Ammons, Mr. Seymour Levine, and Mrs. Ann Neel 
for their time willingly spent serving as judges. 


Vp tea absen F 
eae rene BE 


201 


sistency between experienced and _ inexperi- 
enced scorers was not as high (mean of 69 
per cent) as the personal consistency of the 
more experienced Hs, it was felt to be sat- 
isfactorily high. There was a wide difference 
in the scoring agreement of these various out- 
side judges, one showing quite poor agreement 
(S4 per cent) and another agreeing with the ex- 
perienced E essentially as closely (85 per cent) 
as he did with himself. Scoring agreement was 
lower for areas of conflict (57 per cent agree- 
ment) than for either general attitude (74 per 
cent) or outcomes (76 per cent). However, 
scoring aspect had less effect on agreement 
than the personal scoring proficiencies of dif- 
ferent judges.‘ + 

With several hours training a high level of 
VAT scoring agreement could certainly be 
reached. Tomkins (22) points out that “. . . 
in TAT workshops it is common for the relia- 
bility of ratings to be very low at the beginning 
but to increase to respectable magnitude with 
practice.” 


Comparison of Strong Ratings 
with VAT Preferences 


Each story was scored only once, and 
only for the occupation the plate was designed 


to depict. The story was not scored in the 
few instances (less than 1 per cent) where the 
occupation was completely misinterpreted. 
Combining categories, a chi-square test of the 
independence of Strong scores and VAT 
preference ratings was made for all Ss. The 
hypothesis of no relationship could be rejected 
at the 10 per cent level of confidence for the 
women and at the 2 per cent level of confidence 
forthe men. Although a relationship was thus 
demonstrated, it is not as clear-cut as one 
might like. This may be due to the fact that 
on the whole, although a person may not have 
many interests in common with people already 
engaged in and successful in an occupational 
area, he may be well disposed toward the area 
as he understands it. Thus it might be ex- 

* To save space and cost, Tables 1, 2, 3, 4, and 5 have 
been deposited with the American Documentation 
Institute. Order Document No. 2748 from the Ameri- 
can Documentation Institute, 1719 N Street, N.W., 
Washington 6, D. C., remitting $.50 for microfilm 
(images 1 inch high on standard 35 mm. motion picture 


film) or $.50 for photocopies (6 X 8 inches) readable 
without optical aid. 


fo ae 





SEL ETN 


sak I aA RENDER = A RT 








202 


pected that the relationship between Strong 
scores and VAT ratings would not be high. 


Characteristics of Responses to the VAT 


Reasons for entering the occupation were 
found difficult to score, so scoring was not done 
in this area. The first analysis made was of 
total number of conflicts by conflict area and 
occupation? for the women and men Ss re- 
spectively. The men showed more conflicts 
than women (a mean of 2.0 per story as com- 
pared with 1.7). The men were more con- 
cerned with achievement, insecurity, voca- 
tional, and personal value conflicts; while the 
women showed more marital and affiliation 
conflicts. ‘The men showed little overall differ- 
ence in type of conflict from vocation to voca- 
tion, except perhaps for teaching; while women 
gave evidence of frequent conflicts associated 
with teaching, but very few in the housewife 
and mother areas. Although the data are not 
conclusive, there seems to be some evidence 
for an interaction between nature of conflict 
and specific occupation. The women showed 
more conflicts of aggression and insecurity in 
the teacher area, and more conflicts of achieve- 
ment, recognition, and vocational choice in the 
artist area. 

An analysis was made of the outcomes associ- 
ated with the various plates by the total male 
and female groups. The most frequent out- 
comes in women’s stories were success, con- 
tinuing in the field without success being men- 
tioned, and marrying and leaving the field. 
Laboratory technicians, buyers, and nurses 
were pictured as most successful, while dieti- 
cians and teachers were described as leaving or 
desiring to leave the occupation. Nurses, 
artists, and social workers married, but stayed 
in the field, and laboratory technicians married 
some one with an allied interest. Finally, 
secretaries, dieticians, and nurses were often 
pictured as marrying and leaving the field. 

The most frequent outcomes for men were 
partial or complete success; it would seem that 
they must achieve some kind of success. Law- 
yers, laboratory technicians, and doctors were 
the most successful; the salesman moved to a 
better position within the same employment 

* Tables 1, 2, and 3 are included among those avail- 


able from the ADI (see footnote 4) 
*See Table 4 included in the ADI set (footnote 4). 


Robert B. Ammons, Margaret Newman Butler, and Sam A. Herzig 


area; and the teacher left that field of em- 
ployment. 

Ss of both sexes infrequently described 
teachers as successful, and even where they 
were successful the success tended to be only 
mediocre. Both men and women put great 
emphasis on success, and in addition women 
frequently told stories about retiring from 
economic competition to become housewives 
and mothers. 

The final analysis was of the average num- 
bers of words and the associated standard de- 
viations in stories told by the male and female 
Ss to cards picturing similar occupations. 
Mean lengths were essentially the same (range 
from 104 to 140 words) for the six pairs of 
cards,’ differing only in the case of teaching 
where men told significantly shorter stories 
(mean of 90 words). The story lengths were 
about the same for all occupational situations 
except for teacher for men and lab technician 
for both men and women, which were shorter. 


Insight Regarding Purpose of VAT 


After each male subject had been tested 
with the VAT, he was asked what he thought 
the purpose of the test might be. It was 
concluded that only 7 of the 35 men came near 
to understanding the purpose of the test. 
From the responses given, it could be seen that 
the likelihood of undesirable systematic slant- 
ing to create an impression was small. 


Qualitative Evaluation 

Much useful information can be gained 
clinically from qualitative observations and 
use of “‘total impressions.” Certain observa- 
tions worthy of mention were made during the 
VAT testing program. From content analysis 
of the protocols, it was possible to obtain a 
fairly definite idea of such things as the extent 
and accuracy of S’s information about an 
occupation, the use to which he would put 
physicial equipment and his feelings toward it, 
some of his acute personal problems not directly 
associated with occupations, the place of voca- 
tions in his personality dynamics, and how he 

7 Table 5 is available in the ADI set (see footnote 5). 
Comparisons are made of lab technician with lab tech- 
nician, salesman with buyer, doctor with nurse, teacher 


with teacher, engineer with artist, office worker with 
secretary, personnel worker with social worker. 





A Projective Test for Vocational Research and Guidance 


would like to handle his personal problems 
concerning vocations. It was usually easy to 
estimate the level of identification with the 
principal figure in the story, and the degree of 
personal involvement in the story. Significant 
personal data of many kinds were obtained, 
particularly concerning traumatic experiences. 

As a rule, when presented with plates con- 
cerning their own occupational interests, Ss 
would show a marked increase in enthusiasm. 
Ten Ss well known to Es gave clearly recog- 
nizable stories. Blind matching would prob- 
ably have been perfect or nearly so with actual 
vocational sketches supplemented by _per- 
sonality information. 


The following story illustrates the power of, 


the technique: Male, college senior, age 24, 
story to Plate 4 for men: He’s an attorney 
before a court. On one side you can see the 
judge; on the other side, the jury. He’s de- 
fending a guilty person. Like any lawyer, he’s 
looking for any angles to prove the situation. 
He’s not sure of himself, if the circle on the 
top of his head is a question mark. Am right? 
What do you see it as? Why don’t you ever 
tell me if I’m right or not? He’s been pushed 
into this job and can’t get away from it. It 
looks like a pretty tough job. He’s not suc- 
ceeding. He wonders if he’s going to change 
his job or not. He has in mind this problem, 
and is questioning himself as to what kind of 
job he will have. He wonders if he is going to 
succeed or not. It seems to me that he’s not 
sure of himself. He never has been. There is 
something in back of his mind that he doesn’t 
want to tell people about this. He would 
rather hide it than tell other people. A kind 
of non-professional job. He would rather not 
have too much responsibility. 


Discussion 


The basic assumption of projective testing 
is that S will interpret stimuli in a way which 
will reflect the cognitive and emotional organi- 
zation of his personality. There is a con- 
siderable amount of evidence that the VAT 
provides a suitable situation for the projection 
of feelings and ideas related to S’s vocational 
problems. Scored conflict areas were different 
for stories about different occupations, and 
differed for different individuals. Outcomes 


203 


varied with the occupational situation about 
which a story was told and the sex of S. In 
view of the high scoring reliability these find- 
ings seem to indicate a basic validity for the 
procedure, and are supported in this by a 
qualitatively observed close correspondence be- 
tween information derived from personal ob- 
servation and that from test responses con- 
cerning personality facets and vocational prob- 
lems. The wide variability in the character- 
istics of the responses given speaks against 
the hypothesis that the content and structure 
of the pictures primarily determined the stories 
told. 

The only findings which might be in- 
terpreted as indicating a low validity were 
disagreements in interpretation and a low 
relationship between Strong interest ratings 
and VAT attitude-toward-occupation ratings. 
There were a considerable number of disagree- 
ments in interpretation of responses within the 
scoring system as set up, as evidenced by the 
failure to obtain more nearly perfect scoring 
consistency. Analysis of these disagreements 
almost always led to the conclusion that both 
interpretations were reasonable, and that they 
merely represented essentially equally valid 


but different levels of abstraction in interpre- 


tation. Thus the disagreement may be a re- 
flection on rough methods of scoring and inade- 
quate personality theory rather than the 
validity of the test. 

The discovery of anything but a low correla- 
tion between Strong ratings and VAT scores 
would be little short of amazing. Among 
other things, a high correlation would indicate 
a close relationship between interests common 
to persons working in an occupational area and 
the attitudes of a group of relatively inexperi- 
enced and uninformed persons toward that 
type of occupation. This relationship is not 
likely to obtain. With our group, and perhaps 
any group, one would expect that there would 
be a wide variety of occupations eliciting favor- 
able responses in a projective test. On the 
other hand, the Strong items are deliberately 
weighted to produce score differences between 
occupational interest areas. What is really 
needed is a thorough empirical study of per- 
sonality-vocational-interest relationships. 

The VAT is believed by the authors to be a 
much more versatile clinical instrument than 





RIS RET RE AE 


ew 


atest 








204 


standardized paper and pencil tests of voca- 
tional interest or personality with provisions 
only for categorical answers. It can be scored 
effectively to obtain a large variety of signifi- 
cant information about S, including biographi- 
cal facts of importance, basic conflicts, needs, 
press, nature of identifications with others of 
his own sex, methods of problem solution, and 
attitudes toward occupations and possible 
reasons for them. Reading and writing are 
not called for, so it may be useful in growth 
studies and with relatively illiterate people. 


Summary 


A set of 18 plates for the projective testing of 
personality structure related to vocational 
problems on the college level was constructed, 
with 10 plates for. women, and 8 for men. 
Each plate was ambiguously drawn but clearly 
represented a particular occupational area. 
Methods were developed for identifying and 
scoring general attitude toward an occupation, 
reasons for entering an occupation, general 
and occupational conflicts, and personal or 
vocational outcomes. 

The Vocational Apperception Test (VAT) 
was administered to 40 female and 35 male 
college students with primary interests in cer- 
tain occupational areas as demonstrated by 
Strong vocational interest ratings or declared 
major or both. The stories were scored and 
the scorings analyzed. It was found that 
consistency of scoring was approximately 86 
per cent for experienced scorers rescoring 
protocols after a week, and 69 per cent between 
experienced and inexperienced scorers. Indi- 
cated areas of conflict and outcomes varied 
with the sex of the subject, the occupational 
areas about which the story was told, and the 
particular subject tested. A low but statis- 
tically significant relationship was found be- 
tween ratings on the Strong scales and the VAT 
rated general attitude toward an occupation. 
Information from a follow-up interview in- 
dicated that only 7 of the 35 male subjects 
guessed the purpose of the test. 

In the judgment of the authors, the above 
findings indicate a satisfactory reliability and 
validity for the test. Its flexible form, and its 
emphasis on depth information recommend it 


Robert B. Ammons, Margaret Newman Butler, and Sam A. Herzig 


for use in the clinical exploration of personal 
vocational difficulties and in attacks on a wide 
variety of significant research problems. 


Received August 26, 1949. 


References 


. Ammons, R. B., Butler, Margaret N., and Herzig, 
S. A. The Vocational Apperception Test, plates 
and manual. Louisville, Ky.: R. B. Ammons, 
1949. 

. Bailey, H. W., Gilbert, W. M., and Berg, L. A. 
Counseling and the use of tests in the student 
personnel bureau at the University of Illinois. 
Educ. psychol. Measmt., 1946, 6, 37-60. 

. Balinsky, B. The multiple-choice group Ror- 
schach test as a means of screening applicants 
for jobs. J. Psychol., 1945, 19, 203-208. 

. Bell, J. E. Projective techniques; a dynamic ap- 
proach to the study of the personality. New York: 
Longmans, Green, 1948. 

. Berdie, R. F. Factors related to vocational inter- 
ests. Psychol. Bull., 1944, 41, 137-157. 

. Berkshire, J. R., Bugental, J. F. T., Cassens, F. P., 
and Edgerton, H. A. Test preferences in guid- 
ance centers. Occupations, 1948, 26, 337-343. 

. Bixler, R. H., and Bixler, Virginia H. Test inter- 
pretation in vocational counseling. Educ. psy- 
chol. Measmt., 1946, 6, 145-155. 

. Bordin, E. S. A theory of vocational interests as 
dynamic phenomena. Educ. psychol. Measmt., 
1943, 3, 49-66. 

. Cross, O. H. A study of faking on the Kuder 
Preference Record. Amer. Psychologist, 1948, 
3, 293. 

. Darley, J. G. Clinical aspects and inter pretation of 
the Strong Vocational Interest Blank. New York: 
Psychological Corporation, 1941. 

. Frank, L. K. Projective methods for the study of 
personality. J. Psychol., 1939, 8, 389-413. 

. Kaback, Goldie Ruth. Vocational personalities; 
an application of the Rorschach group method. 
Teach. Coll. Contr. Educ., 1946, No. 924. 

. Kelly, E. L., Terman, L. M., and Miles, C. C. 
Ability to influence one’s score on a typical 
pencil-and-paper test of personality. Character 
& Pers., 1936, 4, 206-215. 

. Kilby, R.W. Some vocational counseling methods. 
1948. To be published. 

. Kurtz, A. K. A research test of the Rorschach 
test. Personnel Psychol., 1948, 1, 41-51. 

. Longstaff, H. P. Fakability of the Strong Interest 
Blank and the Kuder Preference Record. J. 
appl. Psychol., 1948, 32, 360-309. 

. Piotrowski, Z., Candee, B., Balinsky, B., Holtz- 
berg, S., and Von Arnold, B. Rorschach signs 
in the selection of outstanding male mechanical 
workers. J. Psychol., 1944, 18, 131-150. 

. Prados, M. Rorschach studies on artists—painters. 





A Projective Test for Vocational Research and Guidance 205 


I. Quantitative analysis. Rorschach Res. Exch., pational interest. 
1944, 8, 178-183. 123-130 

19. Roe, Anne. A Rorschach study of a group of 22. Tomkins, S. S. 
scientists and technicians. J. consult. Psychol., 
1946, 10, 317-327. 


J. appl. Psychol., 1932, 16, 


The Thematic Apperception Test; 

the theory and technique of interpretation. New 

York: Grune & Stratton, 1947. 

20. Shagass, C. Word association tests for pilot selec- 23. Trabue, M. R. The role of the psychologist in 
tion. Bull. Canad. Psychol. Ass., 1945, 5, 81-82. vocational guidance. J. din. Psychol., 1945, 1, 

21. Steinmetz, H.C. Measuring ability to fake occu- 182-185. 


ees 


3 
K 
P| 
bi 
if 
4 
a 


yogyeracr eae 


SR, OPS TEA 


a8 OR 








Preferred Rate and Extent of the Frequency Vibrato * 


John F. Corso and Don Lewis 
State University of Iowa 


Information has been available for several 
years on the rates and extents of the frequency 
vibrato’ in artistic vocal and instrumental per- 
formance. Extensive studies were made by 
Seashore and his associates on the physical 
characteristics of vibrato tones. Further, 
various individuals have expressed opinions 
concerning the desired rate and extent of the 
frequency vibrato. 

Seashore (6), dealing with the expression of 
emotion in violin music, reports that “the most 
beautiful effect is obtained when the pitch 
oscillation does not exceed one-fourth of a 
tone.” In 1929, Stanley (10) stated that “A 
proper vibrato must be absolutely regular, 
must have the correct frequency—about six 
a second. .’ Seashore and Metfessel (7) 
maintain that in vocal performance “An ampli- 
tude of approximately a half tone interval in 
pitch is a good vibrato.” In a study on the 
control and refinement of the vocal vibrato, 
Wagner (14) selected the rates and extents of 
recognized artists as the criteria for pleasing 
vibratos. These opinions, however, were not 
supported by experimental evidence, except the 
evidence that certain artistic vibratos had 
rates and extents of given magnitudes as 
determined by the physical analysis of the 
stimulus tones. 

Up to the present time, there has been a 
general lack of information on audience prefer- 
ences for vibrato tones. Such information 
should have considerable practical value, both 
to musical performers and to designers of 
electronic musical instruments. 


Purpose of the Investigation 


The present study was undertaken to dis- 
cover listener preferences for vibrato tones. 


* This investigation, which was concerned with the 
generation and psychological analysis of synthetic 
music, was part of a research program financ ed by a 
grant from the Research Corporation. 

' The vibrato may be defined as a musical embellish- 
ment consisting of a rapid rise and fall in the frequency 
of a tone (11). This relatively periodic frequency 
modulation is usually ac companied by synchronous 
pulsations in intensity and periodic alterations in wave 
form. 


Specifically, the purposes were, first, to deter- 
mine for musically untrained individuals the 
preferred combination of rate? and extent* in 
the frequency vibrato of a complex tone at 
each of five octave levels in the equal-tempered 
musical scale; and, second, to discover the 
manner in which vibrato preferences varied 
with the subjects’ musical training or native 
musical ability. 


Procedure 


Apparatus. As it was highly desirable to 
employ a musically acceptable auditory stimu- 
lus, a cascade of multivibrators (12) was em- 
ployed as the tone generating unit. Two 
specific features of this electronic unit made it 
particularly adaptable to the experimental 
situation: (a) the multivibrators produce har- 
monically rich wave forms, commonly exceed- 
ing the three-hundredth harmonic in the out- 
put (13), and (b) the multivibrators possess the 
property of easily synchronizing with another 
voltage having n times its frequency, where n 
is any whole number. Through the locking 
action of the set of multivibrators, it was 
possible to produce a descending series of five 
tones having a frequency ratio of 2:1. The 
frequencies of these tones, separated by octave 
intervals, were 92.5, 185, 370, 740, and 1480 
cycles per second. These values correspond 
to the octave level designations of F*:, F*;, 
F*,, F#;, and F*,, respectively, at which the 
preference measurements were made. Inas- 
much as the complex tones from the multi- 
vibrators were excessively rich in harmonics, it 
was found necessary to eliminate a consider- 
able number of upper harmonics by means of 
filters. 

The primary component of the experimental 
apparatus was a frequency vibrato control 
unit. In this unit, the vibrato rates were 

* The rate of a vibrato is the number of pulsations or 
the number of frequency modulations per second. 

® The extent of a vibrato, as used in this paper, is the 
width or total range of the frequency modulation and is 
limited in meaning to the physical measure of frequency 
in the acoustic wave. Vibrato extent is conventionally 


expressed as a fraction of a whole step in the equal- 
tempered musical scale. 


206 





Preferred Rate and Extent of the Frequency Vibrato 


controlled by the low frequencies produced by 
a pentode phase shift oscillator which imparted 
sine wave oscillation to the multivibrator tone. 
The rates of oscillation employed were 5.5, 6.0, 
6.5, and 7.0 pulsations per second.* The extent 
of the frequency modulation of the multivi- 
brator tones was controlled by a reactance 
tube circuit and voltage dividing network con- 
nected in parallel with the tuned circuit of the 
master oscillator. The extents of modulation 
employed were 0, 0.10, 0.25, and 0.40 of a 
musical step. 

A system of lever action switches, operated 
by the experimenter, made it possible to com- 
bine each of the four vibrato rates with each 
of the four extents. In this manner, by com- 
bining each rate with each extent, twelve 
vibrato tones and one “straight” tone were 
produced. These tones were then presented 
by the method of paired comparisons, each 
tone being paired with every other. Regular 
and inverse orders of presentation were 
employed. 

An electromechanical timing device was 
arranged to present each of the two stimulus 
tones of every pair over a high fidelity loud- 
speaker for a period of 1.7 seconds. The two 
tones of each pair were separated by an interval 
of 1.0 seconds. Five seconds elapsed between 
pairs. At the end of each series of ten judg- 
ments, there was a pause of 23.8 seconds. Ap- 
proximately midway through the experimental 
session, after 80 pairs had been presented, a 
five minute rest period was provided during 
which the subjects were permitted to leave the 
listening studio. 

Subjects. The subjects employed were di- 
vided into two main categories: (1) individuals 
with little or no formal musical training as 
determined by answers on a personal question- 
naire, and (2) individuals with sufficient 
musical training and ability to be members of 
the University of Iowa symphony orchestra 
or chorus. In all, 385 subjects were used. Of 

‘The rates and extents of vibrato that were used 
were selected on the basis of results obtained in a pre- 
liminary experiment (1), where it was found that the 
preferred rates fell between five and seven pulsations 
per second and the preferred extents between zero and 
0.40 of a whole step. 

5 Pairing four rates with four extents in this case did 
not produce 16 different tones because zero extent 


combined with any rate resulted in a “straight” tone 
(that is, a tone without vibrato) 


207 


these, 331 were non-trained individuals and 54 
were trained musicians. 

The non-trained group was further sub- 
divided as follows: (1) Group A, 93 subjects, 
served at octave level F*, for the first hour, 
F*, the second hour, and took both forms of the 
Seashore time test at the third hour; (2) Group 
B, 144 subjects, served at octave level F*; for 
the first hour, F¥, the second hour, and took 
both forms of the Seashore pitch test at the 
third hour; Group C, 94 subjects, served at 
octave level F¥, at each of the first two hours 
and was administered both forms of the Sea- 
shore timbre test at the third hour. In all 
groups, one week elapsed between each of the 
three experimental sessions. 

The trained musicians served only for a 
single hour at octave level F#, (370 cycles per 
second). None of the tests from the Seashore 
battery was administered to this group. 

Prior to the beginning of each laboratory 
period ih which vibrato preference judgments 
were to be made, the following instructions 
were read: 


“This is an experiment dealing with the 
properties of the musical vibrato. You wil! be 
presented with a series of pairs of tones, each 
member of every pair possessing a different 
vibrato or no vibrato. You are asked to select 
the member of each pair which you like best. 
If you like the first tone of a pair better than 
the second, write the number 1 on your record 
sheet; if you like the second tone of a pair 
better than the first, write the number 2 on 
your record sheet. Be sure to make a choice 
on every pair; that is, write down for each pair 
either the number 1 or 2. Are there any 
questions?” 


The musically untrained group consisted of 
experimentally naive persons, all students in an 
elementary course in psychology. The trained 
musicians were advanced undergraduate and 
graduate students majoring in the area of 


music. Approximately 25 subjects were used 
at each experimental session. In an attempt 
to minimize any possible effects resulting from 
the order of presentation of the pairs of stimulus 
tones, four different random orders were used 
at each of the five octave levels tested. 


Results 


Although the method of paired comparisons 
was used to obtain the desired data, the mathe- 











208 


matical analysis did not involve the computa- 
tion of scale values in accordance with tradi- 
tional solutions. Instead, rank order scales of 
vibrato preferences were developed on the 
basis of the total frequency of preferred judg- 
ments for each vibrato tone. Inasmuch as the 
main purpose of the study was to develop a 
series of rank order scales from which the 
preferred vibrato rates and extents could be 
determined, it was felt that further refinement 
through a paired comparisons solution was 
not needed. 

At each of the five octave levels tested, a 
rank order scale of vibrato preference was 
constructed for the non-trained group on the 
basis of the total number of preferred judg- 
ments made to each vibrato combination at 
that level. The number of preferred judg- 


2200 


*} 





John F. Corso and Don Lewis 


ments was then plotted against the vibrato 
combination of rate and extent to obtain a 
series of vibrato preference curves. 

Figure 1 is a consolidated graph which 
shows the effect of increasing vibrato extent on the 
frequency of preferred judgments at each of 
the five octave levels, when rate is held constant. 
Since the number of subjects was not equa! for 
all groups, the curves for F#,, F¥,, and F*, were 
displaced upward by multiplying the fre- 
quencies by 1.5. This was to facilitate a 
direct comparison. 

Three features of the graph in Figure 1 are 
of greatest interest: 

(1) The curves are very consistent for all 
scale positions, with only two slight inversions 
occurring at a rate of 6.5 pulsations per second. 
This consistency was further indicated by 





paTts * $.5 
0 


0 4.28 10 «62540 


o 0 6425) (40 


VIORATO EXTENTS IM TEWTHS OF A STEP (RATES HELO CONSTANT) 


Fic. 1. 


Curves of vibrato preferences for non-musicians at five octave levels of the equal-tempered scale. 


(Curves for F#, F#,, and F#, were displaced upward by multiplying the obtained frequencies by 1.5. This 


made adjustment for differences in number of judges.) 


1 EERE Soko ORD TT 





Preferred Rate and Extent of the Frequency Vibrato 


LtcEmD 
F%. - TIME TEST GRouP 
WIGHEST THIRD ———— 33 
LOWEST THiRD ~~ 





F%, - TIMBRE TEST Grou 
HIGHEST THIRD ——————_ We 33 
LOWEST THIRD) ---~—-— Wedd 


~ 

a ” ~ 
asia te 
. 








F%, - PITCH TEST Grou 
HIGHEST THIRO eso 


LOWEST THIRD - S50 








° 0 2s 


0 -0 86.2 |. 


VIGMATO EXTENTS te TENTHS OF 4 STEP (RATES HELD CONSTANT) 


Fic. 2. 


Curves of vibrato preferences at three octave levels of the equal-tempered scale for groups 


differing in scores obtained on the Seashore tests. 


rank order correlations computed between the 
scale values obtained at the two different 
octave levels for each group of untrained 
subjects. The correlation coefficients were .90 
for Group A (F*, and F*;), .91 for Group B (F#,; 
and F¥.), and .93 for Group C (F#, and retested 
at F¥,). Since these coefficients were found to 
be statistically significant, they tended to sup- 
port the impression obtained from the graphical 
analysis that vibrato preferences remained con- 
stant over a wide portion of the musical range. 

(2) With the two exceptions mentioned 
above, all of the curves indicate a maximum 
preference for an extent of 0.25 of a step, re- 
gardless of the octave tested. The two in- 
versions Gccurred. at octave levels F#; and F?, 
where an extent of 0.10 appears to have been 
slightly preferred when the rate was 6.5 
pulsations per second. 


eae Gg 


(3) The curves for rates of 6.0, 6.5, and 7.0 
pulsations per second all reach approximately 
the same height for a given octave, indicating 
that these rates were all equally preferred and 
that extent was the primary determiner of 
vibrato preference. 

In addition to the preference scales obtained 
at each leve! for the untrained group as a whole, 
a series of rank order scales was constructed for 
several subgroups. These subgroups were ob- 
tained by dividing each of the three larger 
groups approximately into thirds on the basis 
of scores made on the Seashore tests. Pref- 
erence scales were then secured for the highest 
and lowest thirds of each group. Curves de- 
rived from the groups taking the pitch, time, 
and timbre tests are presented in Figure 2. It 
is apparent from the similarity of these curves 
that the ability to discriminate time, pitch, and 





eae IUD mer EO TT ae, SS 








210 


timbre, as measured by the Seashore tests, was 
of little consequence in vibrato preference. 

A rank order scale was developed for the 
trained musicians at octave level F#,. The re- 
sulting curves, together with those for the 
untrained group at the same octave level, are 
presented in Figure 3. Here the curves for 
the trained musicians were displaced upward 
by multiplying the frequencies of preferred 
judgments by 1.7. This took account of differ- 
ences in number of judges, and facilitates a 
direct graphical comparison. The curves for 
musicians show a maximum preference for an 
extent of 0.10 of a musical step, with the rates 
of 6.0 and 6.5 pulsations per second equally 
preferred, while the non-musicians prefer the 
same rates but a wider extent (0.25 of a 
musical step). 

The chi-square test was applied to determine 
whether or not the differences between the 
vibrato preferences of trained and untrained 
individuals shown in Figure 3 could be at- 
tributed to chance factors in sampling. A 
specific example will be given to illustrate the 
manner in which these statistical tests were 
made. The most preferred vibrato tone from 


FREQUENCY OF PREFERRED JUDGMENTS 


mares = 5.5 6.0 





John F. Corso and Don Lewis 


the rank order scale for musicians had a rate 
of 6.5 pulsations per second and an extent of 
0.10 step. The most preferred vibrato tone 
from the rank order scale for non-musicians 
had a rate of 6.0 pulsations per second and an 
extent of 0.25 step. When these two tones 
were presented to the 54 trained musicians for 
judgment in a single paired comparison, 36 
preferred the tone with a 6.5 rate and 0.10 step 
extent while 18 preferred the other tone. For 
the same pair of tones as judged by the 94 non- 
musicians, only 41 preferred the tone with the 
6.5 rate and 0.10 step extent. These fre- 
quencies were used in a two by two table and 
the chi-square value obtained was significant 
beyond the one percent level of confidence. 
The hypothesis was rejected that the observed 
differences were due to fluctuations of random 
sampling alone. Other chi-square tests for 
single paired comparisons were made in a 
similar manner. It was concluded that there 
was a significant difference between the vibrato 
preferences of musically trained and untrained 
individuals. Furthermore, the difference de- 
pended upon a different preference in extent of 
vibrato, not in rate. 


LEGEND 


WON -MUS I CLANS” — ue94 
TRAINED MUSICIANS —-—— weSé 


6.5 





+0 42S 40 Ce) 0 2S 


0 10 625 £40 


VIBRATO EXTENTS 1 TENTHS OF A STEP ( RATES WELD CONSTANT) 


Curves of vibrato preferences for trained musicians and non-musicians at octave level F #,. 


(Curves for 


trained musicians were displaced upward by multiplying the obtained frequencies by 1.7.) 


RES Bic ote Sie Sere eee en | 





Preferred Rate and Extent of the Frequency Vibrato 


Discussion 

The findings reported in this study of vibrato 
preferences are generally in agreement with the 
data of previous investigations on the physical 
characteristics of the vibrato in artistic musical 
performance. The most preferred rate and 
extent, as judged by the untrained group of 
subjects, are similar to the tentative norms 
established for artistic violin performance by 
several investigators (2, 5, 9). These authors 
report the average rate of the violin frequency 
vibrato to be 6.5 pulsations per second, and the 
average extent, 0.25 of a tone. Other studies 
(3, 8) have shown that the typical rates for 
professional violinists and vocalists are the 
same, 6.5 pulsations per second, although the 
violinist’s vibrato is only half as wide as the 
singer’s. 

Ramsdell (4), employing trained musicians 
as observers, determined the critical values of 
rate and extent for maximal richness and for 
singleness of pitch in a 500 cycle pure tone 
with a frequency vibrato. The instructions 
were given, at one time, to increase the rate of 
modulation until a tone of apparently unitary 
pitch was achieved, such as would be satis- 
factory in a single voice. At another time, the 
subjects were asked to vary the extent of 
modulation until maximal richness was ob- 
tained. The results indicate that the richest, 
“most unitary” note occurred at a rate of 6.5 
pulsations per second and an extent of approxi- 
mately a semitone. Although the explanation 
of all the effects of frequency modulation can- 
not be given at the present time, the experi- 
mental evidence seems to support the notion 
that the close agreement between performed 
and preferred vibrato rates and extents is 
dependent upon factors other than those of 
learning alone. 

Summary 

1. The purposes of the present study were, 
first, to determine for musically untrained indi- 
viduals the preferred combination of rate and 
extent in the frequency vibrato of a complex 
tone at each of five octave levels in the equal- 
tempered scale, and, second, to discover the 
manner in which vibrato preferences varied 
with the subjects’ musical training or musical 
ability as represented by scores on the perfor- 
mance tests. 


211 


2. The complex tone employed at each of 
five octave levels was generated by a multi- 
vibrator unit. This tone was then elec- 
tronically modulated at rates of 5.5, 6.0, 6.5, 
and 7.0 pulsations per second. The extents of 
the frequency modulation were 0, 0.10, 0.25, 
and 0.40 of a whole musical step. The vibrato 
tones were presented to groups of subjects for 
judgment by the method of paired comparisons. 

3. Rank order scales of vibrato preference 
were obtained for two main groups: (1) un- 
trained individuals, and (2) trained musicians. 
At each octave level tested, preference scales 
were also obtained for the untrained subjects 
on the basis of their ability to discriminate 
pitch, time, and timbre as determined by tests 
from the Seashore battery. 

4. A comparison of the preference scales for 
untrained individuals indicated that (a) an 
extent of 0.25 step was preferred over a wide 
portion of the equal-tempered scale, (b) rates 
of 6.0, 6.5, and 7.0 pulsations per second were 
about equally preferred over the same range, 
(c) native auditory ability had little effect on 
vibrato preference, and (d) scale values for the 
retest group showed a high reliability of vibrato 
preference judgments. 

5. The trained musicians tended to prefer the 
same rates of 6.0 and 6.5 pulsations per second 
as did non-musicians, but the musicians 
favored a narrower extent (0.10 of a step). 


Received August 30, 1949, 


References 


. Corso, J. F. Preferred rate and extent of the fre- 
quency vibrato. Unpublished Master’s Thesis, 
State Univ. -of Iowa, 1948. 

. Hollinshead, M. T. A study of the vibrato in 
artistic violin playing. Univ. Ja. Stud. Psy- 
chal. Music, 1932, 1, 281-288. 

. Metfessel, M. The vibrato in artistic voices. 
Univ. Ia. Stud. Psychol., 1932, 1, 14-117. 

. Ramsdell, D. H. The psychophysics of frequency 
modulation. Unpublished thesis, Harvard Univ., 
1935. 

. Reger, S. N. The string instrument vibrato. 
Univ. la, Stud. Psychol. Music, 1932, 1, 305-340. 

. Seashore, C. E. Phonophotography in the meas- 
urement of the expression of emotion in music 
and speech. Sci. Mon., 1927, 24, 463-471. 

. Seashore, C. E., and Metfessel, M. Deviation 
from the regular as an art principle. Proc. Nat. 
Acad. Sci., 1925, 2, No. 9, 538-542. 











212 


8. Seashore, H. G. An objective analysis of artistic 
singing. Univ. la. Stud. Psychol. Music, 1935, 
4, 12-157. 

9. Small, A. M. An objective analysis of artistic 
violin performance. Univ. la. Stud. Psychol. 
Music, 1936, 4, 172-229. 

10. Stanley, D. The science of voice. 
Fischer, 1929 


New York: 


John F. Corso and Don Lewis 


11. Stevens, S. S., and Davis, H. D. Hearing: its psy- 
chology and physiology. New York: Wiley, 1938. 

12. Terman, F. E. Radio engineers’ handbook. New 
York: McGraw-Hill, 1943. 

13. Terman, F. E. Measurements in radio engineering. 
New York: McGraw-Hill, 1935. 

14. Wagner, A.H. Remedial and artistic development 
of the vibrato. Univ. la. Stud. Psychol. Music, 
1932, 1, 166-212. 





Book Reviews 


Chapanis, A., Garner, W. R., and Morgan, 
C. T. Applied experimental psychology. 
Human factors in engineering design. New 
York: John Wiley and Sons, Inc. 1949. Pp. 
xi+434. $4.50. 


Serious students of man and his work en- 
vironment will find this book not only pro- 
vocative but a major contribution in terms of 
method and technique for the study of conjoint 
problems. Management personnel, produc- 
tion engineers, industrial consultants, and psy- 
chologists will find clear concise statements 
regarding the use and application of psychologi- 
cal principles to practical problems. The tech- 
niques for the solution of many problems which 
have confounded these groups are clearly set 
forth. 

The book is far from perfect, as the authors 
would be the first to state. Most of the data 
and examples are taken from research spon- 
sored by the military services with only isolated 
instances from industrial situations. This re- 


flects the failure of industry to recognize the 


importance of such research for their own 
problems of production and operation. The 
reader is struck with the wide gaps in the body 
of human engineering knowledge now available. 
This book, however, will in all probability 
stimulate extensive psychological and engineer- 
ing studies to fill these gaps at the earliest 
possible moment. 

The authors have brought to bear on the 
problem of the interactions of man, equipment, 
and his job the research findings of experi- 
mental psychologists, industrial engineers, phys- 
iologists, and anthropologists. They have 
presented this knowledge under four broad 
categories. 


1. The effective design of visual displays. 
This includes a presentation of recent findings 
on the size, shape, and legibility of letters, 
numbers, and scale graduations. Evidence is 
presented on how best to arrange and group 
displays. 

2. The effectiveness of auditory communica- 
tion of information. Techniques for increasing 
speech intelligibility and the recognition of 


tonal signals are presented together with ex- 
amples of applications. 

3. The effective design of operational con- 
trols. This section deals with the optimal 
control sizes, shapes, types of movement, gear 
ratios and the resistance of controls. 

4. The effective arrangement of individual 
work places and grouping equipment. This 
section goes beyond the work of time and 
motion engineers and concerns itself with the 
interactions between man and man, man and 
machine, and machine and machine. These 
interactions are referred to as “links” and quan- 
titative measurements are developed which 
permit units to be linked into an integrated 
system in accordance with the psychological 
and physical requirements of personnel. 


One of the most important sections of the 
book is the short chapter on the use of statistics 
in the analysis of errors. Essentially the 
authors have applied the methods of analysis 
of variance and of errors of measurement which 
are widely used in the theory of test construc- 
tion. The power of these tools and their wide 
application to other problems is immediate;y 
apparent to the reader. 

The value of this book lies in the examples 
of how the methodology of experimental psy- 
chology and statistics can be applied to the 
solution of a wide variety of problems involving 
men, equipment and the job they have to per- 
form. In the opinion of the reviewer, the 
writers have rendered a real service not only 
to psychologists, engineers, and management 
but also to the man on the job, by summarizing 
the present status of knowledge in human 
engineering and indicating how this knowledge 
can be applied in practical situations. 

Jack W. Dunlap 

Dunlap and Associates, Inc., 

New York, N. Y. 


Pease, Katharine. Machine computation of 
elementary statistics. New York: Chartwell 
House, Inc., 1949. Pp. 239. $2.75. 


“This manual is for students learning to use 
computing machines in connection with courses 


213 


Pde te A HERE A RIS 








214 


in elementary statistical methods. It is set up 
to be self-teaching, so that the student, by 
following the procedures in sequence, may 
learn to use the machines with a minimum 
amount of help from the instructor.” With 
this introduction to the preface of her manual, 
the author has proceeded to outline, in careful 
detail, the standard calculating machine pro- 
cedures to be used in addition, subtraction, 
multiplication, division, and extraction of 
square roots, and in obtaining the mean, stand- 
ard deviation, product-moment correlation 
coefficient, and percentile and standard scores. 
A separate set of step-by-step procedures in 
calculation (including appropriate checks for 
errors) is provided for each of the commonly 
used models of Friden, Marchant, and Monroe 
calculating machines. 

Practice is also given in complementary 
numbers, accumulative and subtractive multi- 
plication, multiplication by a constant, and 
reciprocals, and in the use of tables of products, 
reciprocals, quotients, squares, and square 
roots. Standard forms for computing sheets, 


and a list of 26 references round out the man- 
ual’s offerings. 
The result is an understandable and highly 


usable manual which should facilitate the 
learning of calculating machine procedures 
either by the student formally enrolled in an 
elementary statistics course, or by the indi- 
vidual who wishes to learn these techniques 
on his own. Since it limits itself to the me- 
chanical aspects of computation, it comple- 
ments, rather than supplants, the more con- 
ventional elementary statistics text. 
Kenneth E. Clark 


University of Minnesota 


Boynton, Paul W. Selecting the new employee. 
New York: Harper and Brothers, 1949. Pp. 
136. $2.00. 


This is a practical, down-to-earth review of 
the major principles of employment selection 
written in a simple, readable, non-technical 
manner, It is particularly well suited to use 
by executives interested in establishing an em- 
ployment department or by beginners in em- 
ployment work who are seeking orientation 
in this field. 

The author places justifiable stress on the 


Book Reviews 


obligation of the employer to his employees 
and emphasizes the importance of proper selec- 
tion and placement. He next discusses the 
qualifications of the employment man. This 
is followed by an outline of sources and 
methods of recruitment, with particular refer- 
ence to college recruiting. Next follows a dis- 
cussion of the functioning of the employment 
department, which is probably one of the best 
parts of the book. In a manner which reflects 
the author’s long years of experience in practi- 
cal employment work, he brings out the im- 
portance of building acceptance for the em- 
plovment department and indicates many ways 
in which this can be done. The balance of the 
book is devoted to an exposition of interview- 
ing techniques, a brief discussion of tests and a 
short treatment of induction and training. 

Probably one of the most valuable contribu- 
tions of the book is the author’s healthy and 
realistic appraisal of the contribution of psy- 
chological tests to employment work. 

If one were disposed to criticize the book, it 
would be chiefly on two counts: the first is that 
it contributes little that is new or unique. It 
is simply a popularly and clearly written hand- 
book of sound (but far from all-inclusive) em- 
ployment procedures. It is obviously written 
by a man who has had a wealth of practical 
employment experience but is not too well 
acquainted with the literature nor with the 
more sophisticated developments in the field. 


Robert N. McMurry 


Robert N. McMurry and Company, 
Chicago, Illinois 


Bennett, George K., and Cruikshank, Ruth M. 
A summary of clerical tesis. New York: The 
Psychological Corporation, 1949, Pp. iii+ 
122. Paper, $1.25. 


This booklet is “an attempt to bring to- 
gether in a single publication pertinent infor- 
mation regarding tests used in selecting and 
upgrading clerical workers”; it is a companion 
to an earlier booklet dealing with manual and 
mechanical ability tests. There are two dis- 
tinct parts: a discussion of the development 
and use of tests of clerical ability, and a series 
of brief descriptions of specific tests. 

An historical sketch of the growth of clerical 
occupations and survey of developments in 





Book Reviews 215 


clerical testing from 1912 until World War II 
are followed by a review of the types of items 
used in clerical tests, material which could be 
very useful to persons seeking ideas for clerical 
test construction. 

Twe chapters are devoted to job descriptions 
of clerical occupations and to reviews of studies 
in the selection of workers for these occupa- 
tions: these are helpful as an overview of what 
has been tried, but they impress this reviewer 
as being insufficiently analytical and integra- 
tive. That some attempt was made along 
these lines is illustrated by the authors’ noting 
of the fact that Hay’s Pennsylvania Co. norms 
for the Minnesota Clerical Test are very 
similar to those compiled in the USES work 
of Stead, Shartle, and others. But in general 
the criteria of success are not carefully ex- 
amined (e.g., on page 26, Thorndike’s use of 
earnings at ages 20-22, when beginning workers 
have not yet had a chance to prove their worth 
and to earn accordingly) and sometimes they 
are not even described (e.g., Oberheim’s 
“library performance” criterion on page 24): 
a serious defect in view of the recent emphasis 
on the need to validate the criterion. Some- 
times results which might mystify or even disil- 
lusion the unsophisticated are not discussed, as 
in the case of the perfectly logical negative cor- 
relations between Otis scores and salary cited 
in Table V, p. 28. 

Chapter VI is a useful discussion of who 
should be tested, when, and by wkom: content 
often omitted in the more academic treatises, 
but especially necessary in those read by 
laymen. 

The test summaries which take up 45 of the 
115 pages of text give publication and pur- 
chasing data excepting cost (wise in these in- 
flationary days), administration and scoring 
methods and time, type of content, reliability, 
validity, norms, and some evaluative com- 
ments. These descriptions range in length 
from one-half page to about one and one-half 
pages for each of 32 tests. The descriptive 
material is more adequate than the evaluative, 
for while the authors have given as their ob- 
jective the inclusion of ‘a maximum of objec- 
tive information regarding . . . reliability and 
validity,” this information ranges from ‘‘No 
studies reported,” through “Correlation coeffi- 
cient of .63 between speed scores and super- 


” 


visor’s ratings,” to somewhat more detailed 
data not exceeding seven lines. 

The analytical comments sometimes do 
something to remedy this paucity of detailed 
objective evidence in the test summaries, as 
when it is stated “‘The manual does not define 
the superior group as to number of cases or 
number of steps in the rating scale” (p. 99), or 
when the authors say (of one of Bennett’s 
clerical tests), “It needs further study to show 
its usefulness in differential placement work” 
(p. 88), or “This is another among the recently 
published tests (in this case not Bennett’s) 
which include some normative data but little 
except face validity or item types known to be 
valid to indicate its areas of usefulness” (p. 99). 
But these comments are brief, and often non- 
evaluative. For example, concerning one test 
which undeservedly (according to the authors’ 
own criteria) receives a full page of description, 
the authors write simply that it is an American 
revision of an English test, that the author 
points out that good secretaries do well on the 
whole test whereas routine office workers do 
poorly on certain subtests, and that scores 
increase with schooling and are probably re- 
lated to intelligence. They do not state that 
the American validation of this test is ex- 
tremely limited (only 3 of 5 references have 
any such datay and that the ment*oned lack of 
occupational norms makes it useless in voca- 
tional counseling or selection. 

Thirty-two generally available tests are de- 
scribed in the manner just discussed; 8 tests, 
the use of which is restricted, and another 34 
tests which are no longer available or virtually 
unused, are also described more briefly. While 
it is admittedly difficult to categorize tests in 
this manner to the satisfaction of all readers, 
this reviewer is inclined to believe that Bennett 
and Cruikshank would have rendered a greater 
service to personnel psychology if they had 
been more analytical and evaluative in the 
discussion of each test, or if they had applied 
more rigorous standards in deciding which 
tests should be treated. One test publisher 
has told the reviewer in conversation that the 
mere mention of a test in a book on tests in- 
creases the sale of that test: if this is so, then 
a number of tests which have already been tried 
and found wanting are likely to receive a new 
lease on life from this publication. It might 


i 
! 
' 
8 
3 
: 
z 
} 
: 
: 
Hf 
? 








216 


have been a greater service to personnel psy- 
chology and vocational guidance if Bennett 
and Cruikshank had given more space to the 
tests which the evidence shows to be most 
worth using, and had relegated to a small-type 
appendix all of the less promising tests which 
needed to be discussed for historical reasons or 
for completeness of coverage. 

This raises the question of the readers for 
whom the authors Were writing, of the intended 
and probable use of the booklet. Personnel 
psychologists will find this summary a time- 
saving survey of the field, a very valuable 
source of leads to studies they should be 
familiar with, and a helpful reminder of what 
is available in the way of tests; they will also 
find that the test summaries do not enable them 
to make final judgments about tests, but that 
they tell enough to indicate what is worth 
looking into more intensively both in the 
literature and in their own research. Voca- 
tional counselors will also find the survey and 
the leads helpful, but wili need to go to Buros’ 
yearbook and to the more intensive treatises of 
specific tests found in some texts. Personnel 


managers and other executives, who may have 
had a little training in measurement but who 
are not psychologists, are likely to be confused 


by the large number of briefly described and 
sketchily evaluated tests. As this last is one 
of the largest probable consumer groups, and 
also one which is likely to make least use of the 
literature to which this booklet could lead 
them, this limitation becomes even more im- 
portant. Bennett and Cruikshank are not the 
first authors who have, because of the very 
nature of their field, written for too hetero- 
geneous an audience; but it would be inter- 
esting to see what they would produce, if they 
wrote ‘wo versions of the booklet, one for 
technicians and counselors highly trained in 
testing, ana one for personnel workers and 
counselors with at most a course or two in 
measurement. 

In closing, the reviewer, whose eye tends to 
be somewhat jaundiced when reading test 
discussions by test authors and _ publishers 
(timeo Danaos, et dona ferentes), would like to 
point out that Bennett and Cruikshank have 
conscientiously treated their own tests as 


Book Reviews 


objectively as those of other authors and 
publishers. 


Donald E. Super 
Teachers College, 
Columbia U niversity 


Weitzman, Ellis, and McNamara, Walter 
J. Constructing classroom examinations—A 
guide for teachers. Chicago: Science Re- 
search Associates, 1949. Pp. xvi+153. 
$3.00. 


Of all the guide books designed to assist 
teachers in building effective classroom ex- 
aminations this is perhaps the most elementary. 
Beginning with validity and reliability and 
ending with the statistical analysis of test 
scores, it covers in simple, non-technica! lan- 
guage the customary topics, briefly and super- 
ficially. Its greatest value will accrue to those 
teachers or prospective teachers who know 
nothing about objective test construction and 
who desire to learn but little. 

Those who read this book for the purpose of 
becoming informed about important advances 
in achievement test construction during the 
past twenty years will be disappointed. Va- 
lidity and reliability are defined and methods 
of measuring them given but the reasons why 
they are important and the factors which in- 
fluence them are neglected. Instead of em- 
phasizing the place of course objectives in the 
process of item construction and sampling, we 
find recommended a topical outline of subject 
matter with textbook page references. No in- 
structions are given regarding ways of organ- 
izing content and objectives to facilitate item 
construction. Many of the test items used as 
examples are excellent but there is a prepond- 
erance of the more factual type. The teacher- 
made answer sheet and perforated scoring key 
is not treated, but the consumable type of test 
with panel scoring key is. Eight pages are 
devoted to a cumbersome procedure of item 
analysis designed to yield a measure of the 
difficulty of the items. One page is devoted 
to a simple index of discriminating power 
without emphasizing its importance or the 
factors which influence it. Percentile ranks 
are not computed from the mid-frequencies of 
the scores. The student learns to compute 





Book Reviews 


means and standard deviations but standard 
scores are not mentioned, Certainly, better 


manuals on the construction and use of achieve- 
ment tests have been available for many years. 


Walter W. Cook 


University of Minnesota 


Cavan, R. S., Burgess, E. W., Havighurst, R. 
J., and Goldhamer, H. Personal adjustment 
in old age. Chicago: Science Research As- 
sociates, 1949. Pp. xiii¢+-204. $2.95. 


The distinctive contribution of this book to 
applied psychology lies in its detailed account 
of the development, testing and application 
of an Attitude Inventory and an Activity In- 
ventory for the study of persons past sixty 
years of age. The aim of these inventories is 
to secure data on activities and attitudes in 
various areas including health, family, friends, 
work and economic security. 

In testing the validity of these inventories, 
interesting auxiliary schedules were developed: 
a check-list of personal characteristics which an 
interviewer might observe, a set of word por- 
traits, and a list of symptoms supposed to indi- 
cate senility (Appendixes D, E, F). 

More than 8,000 schedules were mailed out. 
More than half of them went to retired teachers, 
retired ministers and widows of ministers. 
More than a quarter of them went to sociology 
professors who distributed them through their 
students. Usable schedules returned by mail 
numbered 2743, and these were supplemented 
by 245 schedules obtained by interview (pp. 
171-172). 

Table 30 (p. 134), based on the entire group 
of 2,988 responses to the Attitude Inventory, 
shows the correlations of partial scores (indi- 
cating degrees of adjustment in different areas) 
with one another and with total scores. Atti- 
tudes toward leisure showed highest correlation 
with the total score in both men and women 
(.73 and .70); happiness, feeling useful, and 
satisfaction in work came next; religious atti- 
tudes had the lowest correlations (.35 and .29). 


On the position of religion the authors make 
the following comment: “This is not sur- 
prising, for almost a third of the subjects were 
ministers or their wives, and they probably 
show relatively little variation in religious atti- 
tudes while they show a wide variation in 
other attitudes” (p. 133). 

Tables 7 to 17 (pp. 48 to 59) are based on a 
smaller “study group” of 499 men and 759 
women (pp. 46-48 and Appendix C), The 
relation between the “study group” and the 
total group is not made clear in the text. 
Replying to a query from this reviewer Dr. 
Havighurst states: “The ‘study group’ con- 
sisted of all the respondents except the two 
major occupational groups, namely the retired 
teachers and the retired ministers and their 
wives. Thus, the ‘study group’ consists of the 
people who are described on pages 170 and 171 
under paragraphs (d) and (e), and the groups 
described in the first few paragraphs on page 
171.” 

Trends with age found in the “study group” 
include (p. 60): Increased feeling of economic 
security in spite of lowered amount of income. 
Increase in religious activities and dependence 
on religion. Decrease in feelings of happiness, 
usefulness, zest, and a corresponding increase 
of lack of interest in life. 

Sex differences include the following (p. 61): 
Women feel somewhat more secure economi- 
cally than men. Women report more physical 
handicaps, more illness, more nervous and 
neurotic symptoms, and more accidents; they 
feel less satisfaction with their health than do 
men. Women have more religious activities 
and more favorable attitudes toward religion 
than do men. Women are less happy than 
men. 

The authors have made an exceptionally 
valuable contribution by their thorough, 
cautious, and critical development of the two 
inventories and by using them in a study of a 
large number of cases. 

Albert R. Chandler 


Ohio State University 








New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to Donald G. Paterson, Editor, 
Department of Psychology, University of Minnesota, Minneapolis 14, Minnesota 


The individual and his religion. Gordon W. 
Allport. New York: The Macmillan Co., 
1950. Pp. 147. $2.50. 

The supervisor's management guide. Louis 
Baldwin et al. New York: American Man- 
agement Association, 1950. Pp. 200. $3.50. 

Know yourself. A workbook for those who 
stuller. Revised edition. Bryng Bryngel- 
son, Myfanwy E. Chapman, and Orvetta K. 
Hensen. Minneapolis: Burgess Publishing 
Co., 1950. Pp. 159. $2.00. 

Making work human. Glen U. Cleeton. Yel- 
low Springs, Ohio: Antioch Press, 1949. 
Pp. 326. $3.75. 

Occupational therapy. William R. Dunton, Jr. 
and Sidney Licht, Editors. Springfield, IIL: 
Charles C Thomas, Publisher, 1950. Pp. 
350. $6.00. 

A handbook of employment interviewing. John 
M. Fraser. London: Macdonald and Evans, 
1950. Pp. 212. 8/6d. 


Theory and practice of psychological testing. 


Frank S. Freeman. New York: Henry Holt 
and Co., 1950. Pp. 518. $3.50. 
Fields of psychology. Second Edition. J. P. 
Guilford, Editor. New York: D. Van Nos- 

trand Co., Inc., 1950. Pp. 779. $5.00. 

A handbook of applied psychology. Douglas H. 
Fryer and Edwin R. Henry, Editors. New 
York: Rinehart and Co., 1950. Two Vol- 
umes, pp. 826. $12.50. 

Counseling adolescents. Shirley A. Hamrin and 
Blanche B. Paulson. Chicago: Science Re- 
search Associates, 1950. $3.50. 

Learning and instruction. National Society 
for the Study of Education, Forty-Ninth 
Yearbook, Part I. Nelson B. Henry, Editor. 
Chicago: University of Chicago Press, 1950. 
Pp. 352. $2.75. 

The education of exceptional children. National 
Society for the Study of Education, Forty- 
Ninth Yearbook, Part II. Nelson B. Henry, 


Editor. Chicago: University of Chicago 
Press, 1950. Pp. 400. $2.75. 

Situational factors in leadership. John K. 
Hemphill. Columbus: Bureau of Educa- 
tional Research, Ohio State University, 1949. 
Pp. 135. $3.00, cloth; $2.50, paper. 

Child development. Second edition. Elizabeth 
B. Hurlock. New York: McGraw-Hill Book 
Co., Inc., 1950. Pp. 669. $4.50. 

How to be happy though young. George Law- 
ton. New York: The Vanguard Press, Inc., 
1949. Pp. 300. $3.00. 

The science of chance. Horace C.. Levinson. 
New York: Rinehart and Co., Inc., 1950. 
Pp. 348. $2.00. 

The meaning of anxiety. Rollo May. 
York: The Ronald Press Co., 1950. 
376. $4.50. 

The culture of industrial man. Paul Meadows. 
Lincoln: University of Nebraska Press, 1950. 
Pp. 216. $3.75. 

Job evaluation. John A. Patton and Reynold 
S. Smith, Jr. Chicago: Richard D. Irwin, 
Inc., 1950. Pp. 338. $4.50. 

The envelope. James S. Plant. 
The Commonwealth Fund, 1950. 
$3.00. 

Introduction to psychosomatic medicine. C. 
Alberto Seguin. New York: International 
Universities Press, 1950. Pp. 320. $5.00. 

How to make achievement tests. Robert M. W. 
Travers. New York: The Odyssey Press, 
1950. Pp. 180. $2.25. 

Human relations in modern industry. R. F. 
Tredgold. New York: International Uni- 
versities Press, 1950. Pp. 192. $2.50. 

The development of a test for selecting research 
personnel. Manpower Branch, Human Re- 
sources Division, Office of Naval Research. 
Pittsburgh: American Institute for Research, 
1950. Pp. 33. 


New 
Pp. 


New York: 
Pp. 299. 





STE ERE 














Ethical Standards for the Distribution of 
Psychological Tests and Diagnostic Aids 


written under the auspices of the 


Committee on Ethical Standards for Psychology 
of the 


American Psychological Association 





This reprint can be purchased for 
10 cents in coin or stamps from the 


American Psychological Association 
1515 Massachusetts Avenue N.W. 
Washington 5, D. C. 





New 





MeGRAW-HILL 


Books 





COLOR PSYCHOLOGY AND COLOR THERAPY 


By Fazer Brnnen. 284 pages, $4.50 


This book presents a wealth of data on color psychology and color therapy assembled from all 
available scientific and factual literature. The author offers a comprehensive treatment, deal- 
ing with the historical, biological, psychological, and visual aspects of the subject. 


HANDBOOK OF EMPLOYEE SELECTION 


By Roy M. Donacus, University of California at Los Angeles, and Mancarner Husparp 
Jones, The State College of Washington. McGraw-Hill Publications in Psychology. 349 
pages, $4.50 


Gathers together all the relevant information contained in many scattered references dealing 
with the selection of employees by means of scientific procedures—mostly tests. It covers all 
types of regular, civilian-paid employment, including factory and clerical jobs, teaching, and 
executive positions. The presentation is in the form of abstracts, containing only essential 
data, which are arranged chronologically. 





EXPERIMENTS IN SOCIAL PROCESS: A Symposium on 
Social Psychology 
Edited by James G. Mriien, The University of Chicago. McGraw-Hill Publications in 
Psychology. 201 pages, $3.00 


This unusual book offers a group of articles by outstanding psychologists, describing new tech- 
niques in social psychology and demonstrating how they can be applied to learn facts about 
interpersonal behavior which will enable peoples and races to live in closer harmony. A final 
chapter presents a unique roundtable discussion which brings a nuclear physicist together with 
these social psychologists to discuss the challenge of the atomic bomb. 


GENERAL CLINICAL COUNSELING. In Educational In- 
stitutions 
By Marron E. Hagn and Matcotm 8. MacLean, University of California at Los Angeles. 
375 pages, $3.50 
Collects and organizes into teachable and comprehensible form the materials pertinent to the 
work of clinical psychologists who counsel with individuals having problems within the normal 
range of problem depth. Emphasis is on the professional psychologist as a counselor; the ap- 
proach is in terms of functions actually performed by the clinical psychologist. 


Send for copies on approval 


McGRAW-HILL BOOK COMPANY, Inc. 


330 West 42nd Street New York 18, N. Y. 











